I am resuming the discussion from September regarding adding
request/response encoding in web.xml. [1] [2]
Since the default encoding for HTML5 is UTF-8, it would be good to
- provide a way to configure the request/response encoding in a web application.
- have an ability to configure the servlet container in a container specify way.
I propose to have the following changes:
- add <request-encoding> to web.xml schema
A sample usage is as follows:
<request-encoding>UTF-8</request-encoding>
- add <response-encoding> to web.xml schema
- update javadoc for ServletRequest#getCharacterEncoding() as follows:
old:
This method returns null if the request does not specify a character encoding
new:
This method returns null if no request character encoding has been
specified. The following methods for specifying the
request character encoding are consulted, in decreasing order of
priority: perrequest, per web app (using deployment descriptor), and
per container (using vendor specific configuration).
- update javadoc for ServletResponse
old:
The charset for the MIME body response can be specified explicitly using the
setCharacterEncoding(java.lang.String) and setContentType(java.lang.String) methods,
or implicitly using the setLocale(java.util.Locale) method.
Explicit specifications take precedence over implicit specifications.
If no charset is specified, ISO-8859-1 will be used.
new:
The charset for the MIME body of the response can be specified
using any of the following techniques: per request, per web-app (using
deployment descriptor), and per container (using vendor specific configuration).
If multiple of the preceding techniques have been employed, the priority is
the order listed.
For per request, the charset for the response can be specified explicitly using the
setCharacterEncoding(java.lang.String) and setContentType(java.lang.String) methods,
or implicitly using the setLocale(java.util.Locale) method.
Explicit specifications take precedence over implicit specifications.
If no charset is explicitly specified, ISO-8859-1 will be used.
#getCharacterEncoding() as follows:
old:
The character encoding may have been specified explicitly using
the setCharacterEncoding(java.lang.String) or setContentType(java.lang.String) methods,
or implicitly using the setLocale(java.util.Locale) method.
Explicit specifications take precedence over implicit specifications.
new:
The following methods for specifying the response character encoding are
consulted, in decreasing order of priority: per request, per web-app (using
deployment descriptor), and per container (using vendor specific configuration).
The first one of these methods that yields a result is returned.
Per-request, the charset for the response can be specified explicitly using the
setCharacterEncoding(java.lang.String) and setContentType(java.lang.String) methods,
or implicitly using the setLocale(java.util.Locale) method.
Explicit specifications take precedence over implicit specifications.
#setCharacterEncoding() as follows:
old:
If the character encoding has already been set by setContentType(java.lang.String) or
setLocale(java.util.Locale), this method overrides it.
new:
If the response character encoding has already been set by the
deployment descriptor, or using the setContentType() or setLocale()
methods, the value set in this method overrides any of those values.
- update 3.11 of spec
old:
The default encoding of a request the container uses to create the request reader and
parse POST data must be “ISO-8859-1” if none has been specified by the client request.
However, in order to indicate to the developer, in this case, the failure of the client
to send a character encoding, the container returns null from the getCharacterEncoding method.
If the client hasn’t set character encoding and the request data is encoded with a
different encoding than the default as described above, breakage can occur.
To remedy this situation, a new method setCharacterEncoding(String enc) has been added
to the ServletRequest interface. Developers can override the character encoding supplied by
the container by calling this method. It must be called prior to parsing any post data or
reading any input from the request. Calling this method once data has been read will not
affect the encoding.
new:
The default encoding of a request the container uses to create the request reader and
parse POST data must be “ISO-8859-1” if none has been specified by the client request,
deployment descriptor or per container using vendor specific configuration.
However, in order to indicate to the developer, in this case, the failure of the client
to send a character encoding, the container returns null from the getCharacterEncoding method.
If the client hasn’t set character encoding and the request data is encoded with a
different encoding than the default as described above, breakage can occur.
To remedy this situation, the <request-encoding> element is available in the web.xml and
the setCharacterEncoding(String enc) method is available on the ServletRequest interface.
Developers can override the character encoding supplied by
the container by adding the element or calling the method. It must be called prior to
parsing any post data or
reading any input from the request. Calling this method once data has been read will not
affect the encoding.
- update 5.5 spec
old:
If the element does not exist or does not provide a mapping, setLocale uses a container dependent mapping.
new:
The <response-encoding> element can be used to explicitly set the
encoding for all responses.
<response-encoding>UTF-8</response-encoding>
If neither element exists or does not provide a mapping, setLocale uses a container dependent mapping.
Please let me know your comments.
Shing Wai Chan
[1]
https://java.net/jira/browse/SERVLET_SPEC-161
[2]
https://java.net/projects/servlet-spec/lists/jsr369-experts/archive/2016-09/message/26