users@jaxb.java.net

possible JAXB bug with non-ASCII characters

From: Geis, Matt <Matt.Geis_at_schwab.com>
Date: Wed, 20 Aug 2003 14:00:52 -0700

Re my last email (included below), I have some more information. I tried
setting the following property on the Marshaller.

 

m.setProperty(m.JAXB_ENCODING, "DEFAULT");

 

This change DID produce the correct, escaped output. However, my input
document is encoded UTF-8, and is specified as such. The default output is
UTF-8. However, the characters are not escaped unless I specify DEFAULT
encoding. This is clearly not a workable solution, as I want my output file
to be UTF-8.

 

Why doesn’t JAXB correctly escape the characters, and how can I get it to do
that? Is this a bug?

 

Matt

 

-----Original Message-----
From: Geis, Matt
Sent: Wednesday, August 20, 2003 12:35 PM
To: users_at_jaxb.dev.java.net
Subject: question about UTF-8 characters

 

Hi,

I’m running into a problem with JAXB. I have an XML document which contains
the character Æ. More accurately, I have a document which contains the
character entity reference &#x00C6;, which dereferences to Æ. When I
unmarshall the document into a JAXB object, I can call the getter for the
given property, and it correctly displays Æ.

 

However, when I marshall the document back into XML, it becomes “Æ “.

 

The ampersand is handled correctly. My XML document has ‘&amp;’ The getter
method shows ‘&’. The marshaled version shows ‘&amp;’

 

I messed around and changed the output encoding to ISO-8859-1, and the
marshaled xml for the Unicode character was Æ. However, that’s not what I
want. What I want is for the output to be UTF-8 encoded, and for it to have
the text ‘&#x00C6;’

 

I found a bug which may be related in JAXR where getBytes() is called on a
String object, but if the String is encoded UTF-8 and the default charset is
not UTF-8, an error will occur (as getBytes() uses the default charset
encoding for the jvm).

 

What do I need to do here? Is this a bug? If not, how to I correctly
marshall the data?

 

Thanks,

Matt