users@jaxb.java.net

Re: Escaping, or removing Invalid XML Characters.

From: Kohsuke Kawaguchi <Kohsuke.Kawaguchi_at_Sun.COM>
Date: Wed, 16 Feb 2005 13:16:41 -0800

Nick Pellow wrote:
> Then I get the following error when marshalling:
>
> java.io.IOException: The character '^C' is an invalid XML character
> at org.apache.xml.serialize.BaseMarkupSerializer.characters(Unknown
> Source)
>
> What is the cleanest way to remove such invalid control characters from a
> content String when marshalling using XML version 1.0 ?

The easiest way is probably to not to put them into JAXB objects in the
first place :-)

That said, if you really want to just remove those characters, what you
can do is to write a SAX XMLFilterImpl. You can intercept characters
method and startElement to modify the text values by removing those
illegal chars.

Then you can forward it to some kind of XMLWriter to print out. Search
the archive for 'XMLWriter' for more about how to turn SAX events to
Unicode and angle brackets.

-- 
Kohsuke Kawaguchi
Sun Microsystems                   kohsuke.kawaguchi_at_sun.com