users@jersey.java.net

Re: [Jersey] Jersey client not sending UTF-8 characters

From: Tatu Saloranta <tsaloranta_at_gmail.com>
Date: Mon, 18 Jan 2010 23:49:32 -0800

On Mon, Jan 18, 2010 at 11:29 AM, Guillaume Bilodeau
<guillaume.bilodeau_at_gmail.com> wrote:
> Hi guys,
>
> We're using Jersey 1.0.3 to implement a REST client. This client needs to
> post an XML envelope containing UTF-8 characters. When debugging within
> Eclipse, we can see that the entity that is being sent is correct (all
> accentuated characters are displayed properly). However, intercepting this
> request using TCPMon shows garbled accentuated characters. We've dug into
> the Jersey source code and it seems to be using an OutputStream that uses
> UTF-8, so we're a bit stumped here.
>
> Any ideas why we're observing this behaviour?

Beyond ensuring to always use versions of String.getBytes() and new
String() that take encoding, are you sure that your debugging tools
actually use correct encoding? That is, what do Eclipse and TCPMon use
for converting from byte arrays to displayable Strings?
I have found that quite often it's the simple tools that display
things incorrectly (often unix shell window, browser), and that actual
data is correct. This is because they often have no contextual
knowledge of what contented is supposed to be in. XML documents are
special since they may have encoding specified though, if tool is able
to check that.

-+ Tatu +-