users@jersey.java.net

Re: [Jersey] Jersey and Charsets

From: Charles Overbeck <coverbec_at_pacbell.net>
Date: Tue, 10 Aug 2010 11:37:04 -0700 (PDT)

Hi Tatu,

Thanks, that helps ease a lot of my concerns.

Although I did find this link, http://www.xml.com/pub/a/2004/07/21/dive.html,
which talks about RFC 3023, which states that if the HTTP content-type is
text/xml, and and there is no charset specified in the HTTP header, then the
character encoding is assumed to be... us-ascii! Even if the the XML has the
encoding attribute explicitly set! I don't know how many clients there are where
this is actually an issue, but to be safe, I'm changing all my Jersey resources
to produce application/xml instead of text/xml.

Charles





________________________________
From: Tatu Saloranta <tsaloranta_at_gmail.com>
To: users_at_jersey.dev.java.net
Sent: Tue, August 10, 2010 10:47:42 AM
Subject: Re: [Jersey] Jersey and Charsets

On Mon, Aug 9, 2010 at 5:46 PM, Charles Overbeck <coverbec_at_pacbell.net> wrote:
> Hello,
>
> I'm trying to understand Jersey and charsets. I'm using Jersey 1.1.5.1, with
> JAXB and mainly the default Jersey classes for generating XML and JSON
> responses, as well as my own MessageBodyWriter that generates PDFs.
>
> 1) In my tests, the XML and JSON responses are always UTF-8 encoded. That's
> actually the encoding I want. However, the charset is not set in the
> Content-Type in the HTTP header. For example, for JSON it reads
> "Content-Type: application/json". I think it should ideally be
> "Content-Type: application/json; charset=UTF-8". How will a client know the
> charset otherwise? Or is JSON always UTF-8? The XML responses don't have the

JSON specification actually specifies only 3 allowed encodings --
UTF-8, UTF-16 and UTF-32 -- so parser can auto-detect encoding. Not
all parsers do (mostly since they expect Reader which handles it
already), but Jackson for example does.
This doesn't mean that there is no code that didn't (try to) use other
encodings (like "whatever my system uses" :) ), but that those are not
really compliant use cases.

> charset either, but I guess that doesn't really matter, as the encoding is
> in the XML prolog. Still, it seems like that would be nice too.

Right, XML declaration has it; but even in absence of it, there is
auto-detection, and if all else fails XML specification mandates that
encoding then must be UTF-8.

I agree in that it would be good to be able to explicitly define
things, just thought I'll mention details of how xml and json handling
work in absence of such external declarations.
So it is mostly not just working by accident. :)

-+ Tatu +-

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_jersey.dev.java.net
For additional commands, e-mail: users-help_at_jersey.dev.java.net