Re: JSR311: Charset defaults

From: Marc Hadley <Marc.Hadley_at_Sun.COM>
Date: Fri, 22 Feb 2008 12:08:41 -0500

On Feb 22, 2008, at 11:43 AM, Bill Burke wrote:

> Shouldn't we go by what the w3c spec says? i vote for that.
Well, the W3C spec says what the default charset is in a HTTP message,
it doesn't necessarily cover what the default charset is for our
@ProduceMime annotation although we could certainly adopt that. A
problem with using the 8859 charset is that characters in the string
that aren't part of the 8859 charset can cause problems. The Javadoc
says the behaviour String.getBytes(charset) is undefined if the string
contains unsupported characters so you could get a '?' in some
implementations, or an exception in others. Defaulting to UTF-8 would
help avoid that.

> On a side note, do we need language in the spec that the default
> text Provider needs to handle
> @ProduceMime(text/*;charset={charset})
> And
> Accept-Charset headers?
This was issue 1:

We had a lengthy discussion back in May and decided that an API was
the best solution. An application can use Request.selectVariant to
pick a suitable charset and then specify that in a Response.

> Which leads me to another tangent:
> MessageWriters need access to the input request.

They can inject all the same stuff as a resource class. I.e.

@Context HttpHeaders headers;

works in a MessageBodyReader and Writer.


> Marc Hadley wrote:
>> Consider the following:
>> @GET
>> @ProduceMime("text/plain")
>> String get() {
>> ...
>> }
>> What charset should we use to serialize the return value ?
>> We could assume that the developer intends the charset to be the
>> default ISO-8859-1 as specified in:
>> In which case we'd emit the content type as-is and just make sure
>> to use ISO-8859-1 for serialization.
>> Alternatively we could assume that the developer doesn't care what
>> charset is used and pick one for them like UTF-8. In that case we'd
>> add an explicit charset parameter to the media-type specified in
>> @ProduceMime (or via the Response).
>> A related question is what to do about other media types like
>> application/* where there is no common default. Should we always
>> use UTF-8 in that case unless there's an explicit charset parameter
>> specified ?
>> Marc.
>> ---
>> Marc Hadley <marc.hadley at>
>> CTO Office, Sun Microsystems.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> --
> Bill Burke
> JBoss, a division of Red Hat
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Marc Hadley <marc.hadley at>
CTO Office, Sun Microsystems.