Re my last email (included below), I have some more information. I tried
setting the following property on the Marshaller.
m.setProperty(m.JAXB_ENCODING, "DEFAULT");
This change DID produce the correct, escaped output. However, my input
document is encoded UTF-8, and is specified as such. The default output is
UTF-8. However, the characters are not escaped unless I specify DEFAULT
encoding. This is clearly not a workable solution, as I want my output file
to be UTF-8.
Why doesn’t JAXB correctly escape the characters, and how can I get it to do
that? Is this a bug?
Matt
-----Original Message-----
From: Geis, Matt
Sent: Wednesday, August 20, 2003 12:35 PM
To: users_at_jaxb.dev.java.net
Subject: question about UTF-8 characters
Hi,
I’m running into a problem with JAXB. I have an XML document which contains
the character Æ. More accurately, I have a document which contains the
character entity reference Æ, which dereferences to Æ. When I
unmarshall the document into a JAXB object, I can call the getter for the
given property, and it correctly displays Æ.
However, when I marshall the document back into XML, it becomes “Æ “.
The ampersand is handled correctly. My XML document has ‘&’ The getter
method shows ‘&’. The marshaled version shows ‘&’
I messed around and changed the output encoding to ISO-8859-1, and the
marshaled xml for the Unicode character was Æ. However, that’s not what I
want. What I want is for the output to be UTF-8 encoded, and for it to have
the text ‘Æ’
I found a bug which may be related in JAXR where getBytes() is called on a
String object, but if the String is encoded UTF-8 and the default charset is
not UTF-8, an error will occur (as getBytes() uses the default charset
encoding for the jvm).
What do I need to do here? Is this a bug? If not, how to I correctly
marshall the data?
Thanks,
Matt