The ASCII control characters TAB, CR and LF are permitted.
On Wed, Oct 22, 2008 at 4:46 AM, Erik van Zijst <erik.van.zijst_at_gmail.com>wrote:
> I think I figured it out.
>
> While UTF-8 allows all ascii control characters (e.g. 0x10), the XML
> spec explicitly forbids these characters, both in raw and escaped
> format:
>
> http://lists.xml.org/archives/xml-dev/199804/msg00502.html
>
> Hence, it seems that xerces is in error here when it accepts the
> control characters and writes them into the serialized xml.
> Incidentally, nu.xom refuses serialization of illegal characters,
> raising a IllegalCharacterDataException (RuntimeException) on
> Element.appendChild("\u0010"), preventing invalid xml from being
> generated.
>
> In my situation, the data comes from a database that is fed through a
> web interface that is happy to accept any UTF-8, including ascii
> control chars. I suppose all I can do is remove/replace all control
> chars before they hit the parser.
>
> cheers,
> Erik
>
>
> On Wed, Oct 22, 2008 at 12:13 AM, Erik van Zijst
> <erik.van.zijst_at_gmail.com> wrote:
> > Hi folks,
> >
> > I'm running into a problem where a string that contains valid UTF-8
> > characters that are illegal in XML (e.g. 0x10), gets serialized by
> > jaxb without escaping/encoding these bytes, effectively producing
> > illegal XML.
> >
> > When I later try to unmarshal these objects, the unmarshaller crashes
> with:
> >
> > javax.xml.bind.UnmarshalException
> > - with linked exception:
> > [org.xml.sax.SAXParseException: An invalid XML character (Unicode:
> > 0x10) was found in the element content of the document.]
> > at
> javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315)
> > ...
> >
> > I've attached a very small unit test that reproduces this problem. I
> > was under the impression that the serializer would escape illegal
> > characters by encoding them like: 
 but instead the test produces
> > invalid xml at line 31 and then crashes on line 35.
> > What am I overlooking?
> >
> > cheers,
> > Erik
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_jaxb.dev.java.net
> For additional commands, e-mail: users-help_at_jaxb.dev.java.net
>
>