>>>>> On Mon, 5 Sep 2016 11:18:19 +1000, Stuart Douglas <sdouglas_at_redhat.com> said:
SD> URL Encoding
SD> At the moment the spec does not really mention URL encoding at all, so
SD> it is not really clear what the default should be. I think we should
SD> explicitly mention in the spec that the recommended default URL
SD> encoding should be UTF-8 as per RFC-3986.
I skimmed RFC-3986 but could not find a definitive statement that UTF-8
should be used. Did I miss it? It seems to favor US-ASCII:
In local or regional contexts and with improving technology, users
might benefit from being able to use a wider range of characters;
such use is not defined by this specification. Percent-encoded
octets (Section 2.1) may be used within a URI to represent characters
outside the range of the US-ASCII coded character set if this
representation is allowed by the scheme or by the protocol element in
which the URI is referenced. Such a definition should specify the
character encoding used to map those characters to octets prior to
being percent-encoded for the URI.
If a reserved character is found in a URI component and
no delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII.
SD> Request/Response Encoding
SD> At the moment the spec explicitly states that these default to
SD> ISO-8859-1, which made sense at the time as this was the default
SD> character encoding for HTML4. HTML5 has changes this however and now
SD> defaults to UTF-8.
SD> To address this I think we need to allow the default to be controlled
SD> in web.xml via a <default-encoding> element. This element will only
SD> affect the request and response encoding, and will override any spec
SD> mandated default. Obviously if the encoding is explicitly specified
SD> the default will not be used.
I think it should definitely be opt-in. Regarding the name, we have
"locale-encoding-mapping", "encodingType" and JSP has "page-encoding".
>>>>> On Mon, 5 Sep 2016 15:47:34 +1000, Greg Wilkins <gregw_at_webtide.com> said:
GW> So they could just be <request-encoding> and <response-encoding>, with
GW> documentation that says that the encoding set by these is overridden by the
GW> programmatic methods: setCharacterEncoding, setContent-Type and/or
GW> setLocale.
Yes, this is good.
>>>>> On Mon, 5 Sep 2016 09:01:00 +0100, Mark Thomas <markt_at_apache.org> said:
MT> +1.
Yes, I agree with Greg here.
SD> We could also look at changing the default to UTF-8, although this may
SD> break existing applications (although they can be fixed by explicitly
I don't think it's worth the risk. I'll say no to that one.
So are we good with this? If so, I'll file a JIRA.
Ed
--
| edward.burns_at_oracle.com | office: +1 407 458 0017