users@jaxb.java.net

Escaping, or removing Invalid XML Characters.

From: Nick Pellow <nick.pellow_at_mindmatics.de>
Date: Tue, 15 Feb 2005 18:03:55 +0100

Hi,

This question may have already been answered, but I
am having the same problem as Jon Gold, and can not search
the archives beyond the current month.

I am trying to Marshall a JAXB Object, that has an element
defined as: (using XML version 1.0)

<xs:element name="sms">
        <xs:complexType mixed="true">
            <xs:attribute name="guid" use="required" type="xs:string"/>
            <xs:attribute name="msisdn" use="required" type="xs:string"/>
            <xs:attribute name="timestamp" use="required" type="xs:long"/>
        </xs:complexType>
</xs:element>

I set the values of this element using:

        sms = mFACTORY.createSms();
        sms.setMsisdn(pSrcnbr);
        sms.setTimestamp(new Long(timeFormatted).longValue());
        sms.setGuid(pGuid);
        sms.getContent().add(pMsg);

When the content of this element, contains illegal XML characters,
(e.g. U+0000 (NUL), U+0001-U+001F)
as described here:
http://www.w3.org/International/questions/qa-controls

Then I get the following error when marshalling:

java.io.IOException: The character '^C' is an invalid XML character
        at org.apache.xml.serialize.BaseMarkupSerializer.characters(Unknown
Source)

What is the cleanest way to remove such invalid control characters from a
content String when marshalling using XML version 1.0 ?

I am using the following code to Marshall the object, mObject:

        Marshaller marshaller = context.createMarshaller();
        marshaller.setProperty("jaxb.encoding", XML_ENCODING);
        marshaller.setProperty("jaxb.formatted.output", Boolean.FALSE);

        final OutputFormat of = new OutputFormat();
        of.setEncoding(XML_ENCODING);
        of.setOmitXMLDeclaration(false);
        of.setIndenting(false);

        StringWriter writer = new StringWriter();
        final XMLSerializer serializer = new XMLSerializer(writer, of);
        marshaller.marshal(mObject, serializer);



Best Regards,
Nick