jsr369-experts@servlet-spec.java.net

[jsr369-experts] Allow encoding to be set per web-app and per container

From: Shing Wai Chan <shing.wai.chan_at_oracle.com>
Date: Fri, 17 Feb 2017 12:42:00 -0800

I am resuming the discussion from September regarding adding
request/response encoding in web.xml. [1] [2]

Since the default encoding for HTML5 is UTF-8, it would be good to
- provide a way to configure the request/response encoding in a web application.
- have an ability to configure the servlet container in a container specify way.

I propose to have the following changes:
- add <request-encoding> to web.xml schema
  A sample usage is as follows:
    <request-encoding>UTF-8</request-encoding>
    
- add <response-encoding> to web.xml schema

- update javadoc for ServletRequest#getCharacterEncoding() as follows:
    old:
        This method returns null if the request does not specify a character encoding
    new:
        This method returns null if no request character encoding has been
        specified. The following methods for specifying the
        request character encoding are consulted, in decreasing order of
        priority: perrequest, per web app (using deployment descriptor), and
        per container (using vendor specific configuration).


- update javadoc for ServletResponse
    old:
        The charset for the MIME body response can be specified explicitly using the
        setCharacterEncoding(java.lang.String) and setContentType(java.lang.String) methods,
        or implicitly using the setLocale(java.util.Locale) method.
        Explicit specifications take precedence over implicit specifications.
        If no charset is specified, ISO-8859-1 will be used.
    new:
        The charset for the MIME body of the response can be specified
        using any of the following techniques: per request, per web-app (using
        deployment descriptor), and per container (using vendor specific configuration).
        If multiple of the preceding techniques have been employed, the priority is
        the order listed.
        For per request, the charset for the response can be specified explicitly using the
        setCharacterEncoding(java.lang.String) and setContentType(java.lang.String) methods,
        or implicitly using the setLocale(java.util.Locale) method.
        Explicit specifications take precedence over implicit specifications.
        If no charset is explicitly specified, ISO-8859-1 will be used.

    #getCharacterEncoding() as follows:
        old:
            The character encoding may have been specified explicitly using
            the setCharacterEncoding(java.lang.String) or setContentType(java.lang.String) methods,
            or implicitly using the setLocale(java.util.Locale) method.
            Explicit specifications take precedence over implicit specifications.
        new:
            The following methods for specifying the response character encoding are
            consulted, in decreasing order of priority: per request, per web-app (using
            deployment descriptor), and per container (using vendor specific configuration).
            The first one of these methods that yields a result is returned.
            Per-request, the charset for the response can be specified explicitly using the
            setCharacterEncoding(java.lang.String) and setContentType(java.lang.String) methods,
            or implicitly using the setLocale(java.util.Locale) method.
            Explicit specifications take precedence over implicit specifications.

    #setCharacterEncoding() as follows:
        old:
            If the character encoding has already been set by setContentType(java.lang.String) or
            setLocale(java.util.Locale), this method overrides it.
        new:
            If the response character encoding has already been set by the
            deployment descriptor, or using the setContentType() or setLocale()
            methods, the value set in this method overrides any of those values.


- update 3.11 of spec
    old:
        The default encoding of a request the container uses to create the request reader and
        parse POST data must be “ISO-8859-1” if none has been specified by the client request.
        However, in order to indicate to the developer, in this case, the failure of the client
        to send a character encoding, the container returns null from the getCharacterEncoding method.

        If the client hasn’t set character encoding and the request data is encoded with a
        different encoding than the default as described above, breakage can occur.
        To remedy this situation, a new method setCharacterEncoding(String enc) has been added
        to the ServletRequest interface. Developers can override the character encoding supplied by
        the container by calling this method. It must be called prior to parsing any post data or
        reading any input from the request. Calling this method once data has been read will not
        affect the encoding.
    new:
        The default encoding of a request the container uses to create the request reader and
        parse POST data must be “ISO-8859-1” if none has been specified by the client request,
        deployment descriptor or per container using vendor specific configuration.
        However, in order to indicate to the developer, in this case, the failure of the client
        to send a character encoding, the container returns null from the getCharacterEncoding method.

        If the client hasn’t set character encoding and the request data is encoded with a
        different encoding than the default as described above, breakage can occur.
        To remedy this situation, the <request-encoding> element is available in the web.xml and
        the setCharacterEncoding(String enc) method is available on the ServletRequest interface.
        Developers can override the character encoding supplied by
        the container by adding the element or calling the method. It must be called prior to
        parsing any post data or
        reading any input from the request. Calling this method once data has been read will not
        affect the encoding.

- update 5.5 spec
    old:
        If the element does not exist or does not provide a mapping, setLocale uses a container dependent mapping.

    new:
        The <response-encoding> element can be used to explicitly set the
        encoding for all responses.
            <response-encoding>UTF-8</response-encoding>
        If neither element exists or does not provide a mapping, setLocale uses a container dependent mapping.
    
Please let me know your comments.

Shing Wai Chan

[1] https://java.net/jira/browse/SERVLET_SPEC-161
[2] https://java.net/projects/servlet-spec/lists/jsr369-experts/archive/2016-09/message/26