jsr369-experts@servlet-spec.java.net

[jsr369-experts] Re: [146-URIEncoding] DISCUSSION

From: Mark Thomas <markt_at_apache.org>
Date: Fri, 31 Mar 2017 15:07:40 +0100

On 29/03/17 09:22, Edward Burns wrote:

<snip/>

> Regardless of the answers to the above questions, I suggest we simplify
> this and do not support configuring different encodings for the
> different parts of the request: path, query string, and body. I propose
> we implement this suggestion as follows.
>
> * Modify request-character-encoding element in
> javaee8/src/web-app_4_0.xsds to be:
>
> <xsd:element name="request-character-encoding" type="javaee:string">
> <xsd:annotation>
> <xsd:documentation>
>
> When specified, this element provides a default request
> character encoding of the web application. This request
> character encoding value pertains to all aspects of reading
> octets from the request, including but not limited to, the
> URI path, the query string, and any request body content.
>
> </xsd:documentation>
> </xsd:annotation>
> </xsd:element>
>
> * Add the "This request character encoding value pertains to all..."
> statement to ServletRequest.getCharacterEncoding(), at the end of the
> first javadoc paragraph.
>
> * In Frame section 3.12 Request data encoding, after the sentence "is
> available on the ServletRequest interface," add the "This request
> character encoding value pertains to all..." statement.
>
> ACTION: Please reply by close of business Tuesday 4 April 2017.

-1. This is logically flawed.

You can not have per web application definitions of URI encoding. It has
to be container wide.

Web applications are identified by matching the decoded URI to the
longest matching context path.

The request URI must be decoded before this match occurs.

The encoding to use to do the decoding must be selected before the
decoding takes place.

Therefore, the encoding to use has to be selected before the correct web
application has been identified.

Additionally, my expectation is that nearly all applications will want
to use UTF-8 for the URI but may have a range of preferences for default
request/response encoding.

My counter proposal is:

- leave request-character-encoding as currently drafted

- define the default URI encoding as UTF-8

- make clear that containers may provide container specific mechanisms
to change the default URI encoding or to provide more complex schemes
such as use of a different encoding for the query string.

Mark