Hi Kohsuke,
Thanks for the info.
I wrote this method to remove such characters from
any String I pass to JAXB for XML marshalling.
It elemenates all characters between 0x0000 and 0x0020 excluding
0x0009,0x000A and 0x000D, (i.e. the illegal control characters.)
Cheers,
Nick
/** Holder of all illegal XML chars. **/
private static byte[] ILLEGAL_XML_1_0_CHARS;
static {
final StringBuffer buff = new StringBuffer();
for (char i = 0x0000; i < 0x0020; i++) {
if (i != 0x0009 &&
i != 0x000A &&
i != 0x000D) {
buff.append(i);
}
}
ILLEGAL_XML_1_0_CHARS = buff.toString().getBytes();
Arrays.sort(ILLEGAL_XML_1_0_CHARS);
}
/**
* Cleans a given String, so that it can be safely used in XML.
* All Invalid characters, will be replaced with the given replace
character.
* Valid XML characters are described here:
* {@link "
http://www.w3c.org/TR/2000/REC-xml-20001006#dt-character"}
*
* @param pString the string to clean
* @param pReplacement the char to use to replace the invalid characters
* @return the string, cleaned for XML.
*/
public static String cleanStringForXml(String pString, char
pReplacement) {
final byte[] bytes = pString.getBytes();
for (int i = 0; i < bytes.length; i++) {
byte aByte = bytes[i];
if (Arrays.binarySearch(ILLEGAL_XML_1_0_CHARS, aByte) >= 0) {
bytes[i] = (byte) pReplacement;
}
}
return new String(bytes);
}
>-----Ursprüngliche Nachricht-----
>Von: Kohsuke Kawaguchi [mailto:Kohsuke.Kawaguchi_at_Sun.COM]
>Gesendet: Mittwoch, 16. Februar 2005 22:17
>An: users_at_jaxb.dev.java.net
>Betreff: Re: Escaping, or removing Invalid XML Characters.
>
>
>Nick Pellow wrote:
>> Then I get the following error when marshalling:
>>
>> java.io.IOException: The character '^C' is an invalid XML character
>> at
>org.apache.xml.serialize.BaseMarkupSerializer.characters(Unknown
>> Source)
>>
>> What is the cleanest way to remove such invalid control characters from a
>> content String when marshalling using XML version 1.0 ?
>
>The easiest way is probably to not to put them into JAXB objects in the
>first place :-)
>
>That said, if you really want to just remove those characters, what you
>can do is to write a SAX XMLFilterImpl. You can intercept characters
>method and startElement to modify the text values by removing those
>illegal chars.
>
>Then you can forward it to some kind of XMLWriter to print out. Search
>the archive for 'XMLWriter' for more about how to turn SAX events to
>Unicode and angle brackets.
>
>--
>Kohsuke Kawaguchi
>Sun Microsystems kohsuke.kawaguchi_at_sun.com
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe_at_jaxb.dev.java.net
>For additional commands, e-mail: users-help_at_jaxb.dev.java.net
>