[servlet-spec users] [jsr369-experts] [SPEC-161] Encoding in Deployment Descriptor

From: Edward Burns <edward.burns_at_oracle.com>
Date: Wed, 7 Sep 2016 08:21:39 -0700

>>>>> On Wed, 7 Sep 2016 09:18:42 +1000, Stuart Douglas <sdouglas_at_redhat.com> said:

SD> Hmm, re-reading it this is not as clear cut as I thought it was. UTF-8
SD> gets mentioned in the following places:

SD> 2.5 Identifying Data

SD> When a new URI scheme defines a component that represents textual
SD> data consisting of characters from the Universal Character Set [UCS],
SD> the data should first be encoded as octets according to the UTF-8
SD> character encoding [STD63]; then only those octets that do not
SD> correspond to characters in the unreserved set should be percent-
SD> encoded.

SD> 3.2.2. Host

SD> Non-ASCII
SD> characters must first be encoded according to UTF-8 [STD63], and then
SD> each octet of the corresponding UTF-8 sequence must be percent-
SD> encoded to be represented as URI characters. URI producing
SD> applications must not use percent-encoding in host unless it is used
SD> to represent a UTF-8 character sequence.

SD> So the 'host' part of the URI is definitely UTF-8, but it is not made
SD> super clear if this applies to the path component as well. I am pretty
SD> sure it does (and that seems to be the general consensus around the
SD> internet). I read section 2.5 as applying to all components, which
SD> includes the path.

If the URI RFC is not itself clear, then I think we should not say
anything about using UTF-8 as the default encoding in the request.

>>>>>>> On Mon, 5 Sep 2016 15:47:34 +1000, Greg Wilkins <gregw_at_webtide.com> said:
>>
GW> So they could just be <request-encoding> and <response-encoding>, with
GW> documentation that says that the encoding set by these is overridden by the
GW> programmatic methods: setCharacterEncoding, setContent-Type and/or
GW> setLocale.

I have filed SERVLET_SPEC-161 for this.

Ed

-- 
| edward.burns_at_oracle.com | office: +1 407 458 0017