[jsr369-experts] Re: PR Feedback: 173-RequestDataEncoding

From: Edward Burns <edward.burns_at_oracle.com>
Date: Fri, 28 Apr 2017 13:44:44 -0700

>>>>> On Wed, 26 Apr 2017 08:52:13 +1000, Stuart Douglas <sdouglas_at_redhat.com> said:

SD> If we do want to clarify anything here (which I am not convinced is
SD> nessesary) IMHO we should state exactly what this affects, namely:
SD> - The reader returned from getReader() will decode into this charset
SD> - Request parameters from a post body will be decoded into this
SD> charset after they have been parsed from the request

>>>>> On Wed, 26 Apr 2017 11:08:19 +0100, Mark Thomas <markt_at_apache.org> said:

MT> +1

MT> The problem is that without the character encoding the server is left to
MT> guess which encoding was used to convert the non US-ASCII characters
MT> into %nn values.

I know I said this was fine, but after re-reading the HTML5 section
Mr. Reschke quotes
<https://www.w3.org/TR/html5/forms.html#application/x-www-form-urlencoded-encoding-algorithm>
I don't agree with your saying that the server is left to guess. The
whole application/x-www-form-urlencoded encoding algorithm, with items
4.5, 4.5.2 and 5 in particular, very clearly states that everything will
be in US-ASCII, including the %nn. There will be no non-USASCII
characters if that algorithm is correctly used to produce the bytes sent
to the server.

MT> How about something along these lines:

MT> "Currently, many browsers do not send a char encoding

[...]

MT> null from the getCharacterEncoding method."

>>>>> On Wed, 26 Apr 2017 14:17:11 -0700, Edward Burns <edward.burns_at_oracle.com> said:

EB> This is fine with me.

Yes, I know I said it was fine, but I have changed my position. I am no
longer fine with it.

>>>>> On Thu, 27 Apr 2017 07:46:45 +1000, Stuart Douglas <sdouglas_at_redhat.com> said:

SD> I think this needs to be clarified. Does it return null:

SD> 1) If the encoding defaults to ISO-8859-1 because nothing was specified
SD> or
SD> 2) If the client did not send a character encoding

My interpretation of the existing text is 2).

SD> "the failure of the client to send a character encoding, the container
SD> returns null" implies that this is option 2), however I don't think
SD> this is explicitly made clear, as the "in this case" appears to be
SD> referring to the previous sentence which talks about defaulting to
SD> ISO-8859-1.

Given my reconsideration of Mark's proposal, I'm going to take another
stab at the text, based on my initial attempt from Tuesday and trying to
incorporate something from Mark's

PROPOSAL:

Modify the "very misleading" text to be the following:

Spec3.12> "Currently, many browsers do not send a char encoding
Spec3.12> qualifier with the Content-Type header, leaving open the
Spec3.12> determination of the character encoding for reading HTTP
Spec3.12> requests.

In the absence of a char encoding qualifier, if the Content-Type is
application/x-www-form-urlencoded, the default encoding the container
uses to create the request reader and parse POST data must be US-ASCII.
For any other Content-Type, if none has been specified by the client
request, web application or container vendor specific configuration (for
all web applications in the container), the

Spec3.12> default encoding of a request the container uses to create the
Spec3.12> request reader and parse POST data must be ISO-8859-1.

However, in order to indicate to the developer the absence of a char
encoding qualifier, the container must return null from the
getCharacterEncoding method."

------------

ACTION: Please respond by start of business PDT Wednesday 3 May 2017.
In the absence of a response, we will go with the above proposal.

Thanks,

Ed

-- 
| edward.burns_at_oracle.com | office: +1 407 458 0017