Hello Volunteers,
Julian Reschke, one of the authors of RFC 7231, filed two JIRAs today
against the Public Review.  One of them was trivial and I fixed it.
The other one I'd like to run by you before fixing.
>>>>> On Tue, 25 Apr 2017 15:25:32 +0000 (UTC), "reschke (JIRA)" <jira-no-reply_at_java.net> said:
JR>                  URL: 
https://java.net/jira/browse/SERVLET_SPEC-173
He quotes some text from 3.12 Request data encoding:
Spec3.12> "Currently, many browsers do not send a char encoding
Spec3.12> qualifier with the Content-Type header, leaving open the
Spec3.12> determination of the character encoding for reading HTTP
Spec3.12> requests.  The default encoding of a request the container
Spec3.12> uses to create the request reader and parse POST data must be
Spec3.12> ISO-8859-1 if none has been specified by the client request,
Spec3.12> web application or container vendor specific configuration
Spec3.12> (for all web applications in the container). However, in order
Spec3.12> to indicate to the developer, in this case, the failure of the
Spec3.12> client to send a character encoding, the container returns
Spec3.12> null from the getCharacterEncoding method."
JR> That is very misleading.
JR> From an HTTP payload point of view, the actual character encoding
JR> for "application/x-www-form-urlencoded", as defined in
JR> <
https://www.w3.org/TR/html5/forms.html#application/x-www-form-urlencoded-encoding-algorithm>
JR> is *always* US-ASCII. Period.
Indeed, step 5 of the encoding algorithm is 
HTML5> 5. Encode result as US-ASCII and return the resulting byte stream.
JR> The octet representation of non-US-ASCII characters is *always*
JR> percent-encoded - this means that whatever the HTTP payload header
JR> fields describes is totally irrelevant for this content type (as
JR> long as it is an USASCII-compatible encoding).
JR> It may not be possible to change the ISO-8859-1 default, but note
JR> that the HTTP spec never ever said that this actually is the default
JR> (I believe earlier versions of the servlet spec pretended that this
JR> was the case).
Though it's not exactly clear what he wants us to do, I propose the
following.  
PROPOSAL: 
Modify the "very misleading" text to be the following:
Spec3.12> "Currently, many browsers do not send a char encoding
Spec3.12> qualifier with the Content-Type header, leaving open the
Spec3.12> determination of the character encoding for reading HTTP
Spec3.12> requests.  
In this case, if the Content-Type is application/x-www-form-urlencoded,
the default encoding the container uses to create the request reader and
parse POST data must be US-ASCII.  For any other Content-Type, if none
has been specified by the client request, web application or container
vendor specific configuration (for all web applications in the
container), the
Spec3.12> default encoding of a request the container uses to create the
Spec3.12> request reader and parse POST data must be ISO-8859-1.
Spec3.12> However, in order to indicate to the developer, in this
Spec3.12> case, the failure of the client to send a character encoding,
Spec3.12> the container returns null from the getCharacterEncoding
Spec3.12> method."
------------
So basically the operative change is to explicitly call out the
Content-Type of application/x-www-form-urlencoded and say that US-ASCII
must be used to parse the request reader and parse the POST data.
ACTION: Please let me know your thoughts on this by start of business
PDT Friday 28 April 2017.  In the absence of a response I'll change the
text of 3.12.
Thanks,
Ed
-- 
| edward.burns_at_oracle.com | office: +1 407 458 0017