users@jaxb.java.net

Escaping illegal characters during marshalling

From: Erik van Zijst <erik.van.zijst_at_gmail.com>
Date: Wed, 22 Oct 2008 00:13:36 +1100

Hi folks,

I'm running into a problem where a string that contains valid UTF-8
characters that are illegal in XML (e.g. 0x10), gets serialized by
jaxb without escaping/encoding these bytes, effectively producing
illegal XML.

When I later try to unmarshal these objects, the unmarshaller crashes with:

javax.xml.bind.UnmarshalException
 - with linked exception:
[org.xml.sax.SAXParseException: An invalid XML character (Unicode:
0x10) was found in the element content of the document.]
        at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)
...

I've attached a very small unit test that reproduces this problem. I
was under the impression that the serializer would escape illegal
characters by encoding them like: &#010; but instead the test produces
invalid xml at line 31 and then crashes on line 35.
What am I overlooking?

cheers,
Erik