>>>>> On Wed, 26 Apr 2017 08:52:13 +1000, Stuart Douglas <sdouglas_at_redhat.com> said:
SD> If we do want to clarify anything here (which I am not convinced is
SD> nessesary) IMHO we should state exactly what this affects, namely:
SD> - The reader returned from getReader() will decode into this charset
SD> - Request parameters from a post body will be decoded into this
SD> charset after they have been parsed from the request
>>>>> On Wed, 26 Apr 2017 11:08:19 +0100, Mark Thomas <markt_at_apache.org> said:
MT> +1
MT> The problem is that without the character encoding the server is left to
MT> guess which encoding was used to convert the non US-ASCII characters
MT> into %nn values.
I know I said this was fine, but after re-reading the HTML5 section
Mr. Reschke quotes
<
https://www.w3.org/TR/html5/forms.html#application/x-www-form-urlencoded-encoding-algorithm>
I don't agree with your saying that the server is left to guess. The
whole application/x-www-form-urlencoded encoding algorithm, with items
4.5, 4.5.2 and 5 in particular, very clearly states that everything will
be in US-ASCII, including the %nn. There will be no non-USASCII
characters if that algorithm is correctly used to produce the bytes sent
to the server.
MT> How about something along these lines:
MT> "Currently, many browsers do not send a char encoding
[...]
MT> null from the getCharacterEncoding method."
>>>>> On Wed, 26 Apr 2017 14:17:11 -0700, Edward Burns <edward.burns_at_oracle.com> said:
EB> This is fine with me.
Yes, I know I said it was fine, but I have changed my position. I am no
longer fine with it.
>>>>> On Thu, 27 Apr 2017 07:46:45 +1000, Stuart Douglas <sdouglas_at_redhat.com> said:
SD> I think this needs to be clarified. Does it return null:
SD> 1) If the encoding defaults to ISO-8859-1 because nothing was specified
SD> or
SD> 2) If the client did not send a character encoding
My interpretation of the existing text is 2).
SD> "the failure of the client to send a character encoding, the container
SD> returns null" implies that this is option 2), however I don't think
SD> this is explicitly made clear, as the "in this case" appears to be
SD> referring to the previous sentence which talks about defaulting to
SD> ISO-8859-1.
Given my reconsideration of Mark's proposal, I'm going to take another
stab at the text, based on my initial attempt from Tuesday and trying to
incorporate something from Mark's
PROPOSAL:
Modify the "very misleading" text to be the following:
Spec3.12> "Currently, many browsers do not send a char encoding
Spec3.12> qualifier with the Content-Type header, leaving open the
Spec3.12> determination of the character encoding for reading HTTP
Spec3.12> requests.
In the absence of a char encoding qualifier, if the Content-Type is
application/x-www-form-urlencoded, the default encoding the container
uses to create the request reader and parse POST data must be US-ASCII.
For any other Content-Type, if none has been specified by the client
request, web application or container vendor specific configuration (for
all web applications in the container), the
Spec3.12> default encoding of a request the container uses to create the
Spec3.12> request reader and parse POST data must be ISO-8859-1.
However, in order to indicate to the developer the absence of a char
encoding qualifier, the container must return null from the
getCharacterEncoding method."
------------
ACTION: Please respond by start of business PDT Wednesday 3 May 2017.
In the absence of a response, we will go with the above proposal.
Thanks,
Ed
--
| edward.burns_at_oracle.com | office: +1 407 458 0017