On 07/09/2016 16:21, Edward Burns wrote:
>>>>>> On Wed, 7 Sep 2016 09:18:42 +1000, Stuart Douglas <sdouglas_at_redhat.com> said:
> 
> SD> Hmm, re-reading it this is not as clear cut as I thought it was. UTF-8
> SD> gets mentioned in the following places:
> 
> SD> 2.5 Identifying Data
> 
> SD>   When a new URI scheme defines a component that represents textual
> SD>    data consisting of characters from the Universal Character Set [UCS],
> SD>    the data should first be encoded as octets according to the UTF-8
> SD>    character encoding [STD63]; then only those octets that do not
> SD>    correspond to characters in the unreserved set should be percent-
> SD>    encoded.
> 
> SD> 3.2.2.  Host
> 
> SD>   Non-ASCII
> SD>    characters must first be encoded according to UTF-8 [STD63], and then
> SD>    each octet of the corresponding UTF-8 sequence must be percent-
> SD>    encoded to be represented as URI characters.  URI producing
> SD>    applications must not use percent-encoding in host unless it is used
> SD>    to represent a UTF-8 character sequence.
> 
> SD> So the 'host' part of the URI is definitely UTF-8, but it is not made
> SD> super clear if this applies to the path component as well. I am pretty
> SD> sure it does (and that seems to be the general consensus around the
> SD> internet). I read section 2.5 as applying to all components, which
> SD> includes the path.
> 
> If the URI RFC is not itself clear, then I think we should not say
> anything about using UTF-8 as the default encoding in the request.
I strongly disagree.
The web is moving (some might argue has moved) towards using UTF-8. We
should be moving with "the general consensus around the internet" and
using UTF-8 by default.
Tomcat has been using UTF-8 by default for URIs since early 2014 and I
don't recall a single issue being reported because of it.
Mark
> 
>>>>>>>> On Mon, 5 Sep 2016 15:47:34 +1000, Greg Wilkins <gregw_at_webtide.com> said:
>>>
> GW> So they could just be <request-encoding> and <response-encoding>, with
> GW> documentation that says that the encoding set by these is overridden by the
> GW> programmatic  methods: setCharacterEncoding, setContent-Type and/or
> GW> setLocale.
> 
> I have filed SERVLET_SPEC-161 for this.
> 
> Ed
>