users@jersey.java.net

Re: [Jersey] Jersey and Charsets

From: Tatu Saloranta <tsaloranta_at_gmail.com>
Date: Tue, 10 Aug 2010 10:47:42 -0700

On Mon, Aug 9, 2010 at 5:46 PM, Charles Overbeck <coverbec_at_pacbell.net> wrote:
> Hello,
>
> I'm trying to understand Jersey and charsets. I'm using Jersey 1.1.5.1, with
> JAXB and mainly the default Jersey classes for generating XML and JSON
> responses, as well as my own MessageBodyWriter that generates PDFs.
>
> 1) In my tests, the XML and JSON responses are always UTF-8 encoded. That's
> actually the encoding I want. However, the charset is not set in the
> Content-Type in the HTTP header. For example, for JSON it reads
> "Content-Type: application/json". I think it should ideally be
> "Content-Type: application/json; charset=UTF-8". How will a client know the
> charset otherwise? Or is JSON always UTF-8? The XML responses don't have the

JSON specification actually specifies only 3 allowed encodings --
UTF-8, UTF-16 and UTF-32 -- so parser can auto-detect encoding. Not
all parsers do (mostly since they expect Reader which handles it
already), but Jackson for example does.
This doesn't mean that there is no code that didn't (try to) use other
encodings (like "whatever my system uses" :) ), but that those are not
really compliant use cases.

> charset either, but I guess that doesn't really matter, as the encoding is
> in the XML prolog. Still, it seems like that would be nice too.

Right, XML declaration has it; but even in absence of it, there is
auto-detection, and if all else fails XML specification mandates that
encoding then must be UTF-8.

I agree in that it would be good to be able to explicitly define
things, just thought I'll mention details of how xml and json handling
work in absence of such external declarations.
So it is mostly not just working by accident. :)

-+ Tatu +-