users@jaxb.java.net

question about UTF-8 characters

From: Geis, Matt <Matt.Geis_at_schwab.com>
Date: Wed, 20 Aug 2003 12:34:49 -0700

Hi,

I’m running into a problem with JAXB. I have an XML document which contains
the character Æ. More accurately, I have a document which contains the
character entity reference &#x00C6;, which dereferences to Æ. When I
unmarshall the document into a JAXB object, I can call the getter for the
given property, and it correctly displays Æ.

 

However, when I marshall the document back into XML, it becomes “Æ “.

 

The ampersand is handled correctly. My XML document has ‘&amp;’ The getter
method shows ‘&’. The marshaled version shows ‘&amp;’

 

I messed around and changed the output encoding to ISO-8859-1, and the
marshaled xml for the Unicode character was Æ. However, that’s not what I
want. What I want is for the output to be UTF-8 encoded, and for it to have
the text ‘&#x00C6;’

 

I found a bug which may be related in JAXR where getBytes() is called on a
String object, but if the String is encoded UTF-8 and the default charset is
not UTF-8, an error will occur (as getBytes() uses the default charset
encoding for the jvm).

 

What do I need to do here? Is this a bug? If not, how to I correctly
marshall the data?

 

Thanks,

Matt