jsr369-experts@servlet-spec.java.net

[jsr369-experts] Re: [servlet-spec users] UTF-8 Again

From: Greg Wilkins <gregw_at_webtide.com>
Date: Mon, 5 Sep 2016 15:47:34 +1000

Stuart,

+1 for RFC-3986

+0.5 for <default-encoding>, as I think that perhaps that should only apply
to the response encoding. The server really only has control of the
content it generates and it is the browsers and clients that control the
encoding of requests.

So even if they are both trending towards UTF-8, it may not be that they
both should be switched at the same time for the same application and same
browser population.

Maybe <default-request-encoding> and <default-response-encoding>
Actually we really don't need to say "default" as there will still be
another default that is used when neither these elements are set.

So they could just be <request-encoding> and <response-encoding>, with
documentation that says that the encoding set by these is overridden by the
programmatic methods: setCharacterEncoding, setContent-Type and/or
setLocale.

cheers













On 5 September 2016 at 11:18, Stuart Douglas <sdouglas_at_redhat.com> wrote:

> Hello everyone,
>
> I know this was discussed before on the users list, but the discussion
> kind of died out without anything being decided.
>
> As I am sure everyone is aware HTML5 changes the default encoding from
> ISO-8859-1 to UTF-8. Most modern web applications will be written to
> use UTF-8 and as time goes on ISO-8859-1 will become less and less
> relevant.
>
> At the moment there is no easy and standard way to use UTF-8. The only
> standard way is to do it programmatically using the relevant methods
> on the request and response object. Most containers offer non standard
> ways of setting the default, however there is no standard way.
>
> I really think this is something we need to address in the spec.
>
> There are really two different parts to this issue, URL encoding and
> request/response encoding. I will talk about each of them separately.
>
> URL Encoding
>
> At the moment the spec does not really mention URL encoding at all, so
> it is not really clear what the default should be. I think we should
> explicitly mention in the spec that the recommended default URL
> encoding should be UTF-8 as per RFC-3986.
>
> The URL encoding is something that really needs to be determined
> container wide, as the URL must be decoded before it is mapped to a
> webapp, so I don't think this is something that we can control on a
> per app basis.
>
> Request/Response Encoding
>
> At the moment the spec explicitly states that these default to
> ISO-8859-1, which made sense at the time as this was the default
> character encoding for HTML4. HTML5 has changes this however and now
> defaults to UTF-8.
>
> To address this I think we need to allow the default to be controlled
> in web.xml via a <default-encoding> element. This element will only
> affect the request and response encoding, and will override any spec
> mandated default. Obviously if the encoding is explicitly specified
> the default will not be used.
>
> We could also look at changing the default to UTF-8, although this may
> break existing applications (although they can be fixed by explicitly
> setting the old default, either in container specific config or via
> the new web.xml element). Even though breaking compatibility may cause
> some short term pain I think it is probably worth it.
>
> Stuart
>



-- 
Greg Wilkins <gregw@webtide.com> CTO http://webtide.com