users@servlet-spec.java.net

[servlet-spec users] Re: Easy UTF-8

From: Mark Thomas <markt_at_apache.org>
Date: Mon, 31 Aug 2015 12:01:52 +0100

On 30/08/2015 20:19, Yannick Majoros wrote:
> Hi,
>
> Uh, it's always been quite easy. Why do you think it isn't?
>
> You're citing Tomcat, which isn't Java EE btw.

No, Tomcat isn't a full Java EE implementation but Tomcat implements the
Servlet specification and this is the Servlet EG. Pointing out (using
one of the many available Servlet implementations) that changing the
default character encoding requires container specific configuration and
asking for the specification to provide something doesn't seem unreasonable.

The OP could have made the same point with Glassfish, WebSphere,
WebLogic etc.

> For Servlet, it's up to you. As long as you don't rely on defaults, you
> should be fine. JSPs, if you still use them have it quite clear too.

And that is the point. If you want the default to be something other
than ISO-8859-1 then it has to be changed in multiple places and you
almost certainly need to use container specific configuration as well.

> Everytime I've seen someone struggle with this, he used a framework that
> made dumb assumptions (Struts anyone? That's not Java EE btw). Or the
> developer himself was confused, relied on defaults or converted multiple
> times...

That is a little unfair. While I have also seen those sorts of errors
there are also issues (covered in the Tomcat FAQ linked below) with
non-spec compliant browser behaviour that contribute to the problem.

> I'm curious, what do you want an "encoding" element in web.xml to do?

That is a fair question. There are multiple things that you might want
to change.

1. URI decoding
You can't define this per web application since the URI needs to decoded
before it is mapped to the web application. Therefore this has to be a
container wide setting which means this pretty much has to use container
specific configuration.
What we could do is make UTF-8 rather than ISO-8859-1 the default.

2. Response bodies
A web.xml setting could be used to change from the current ISO-8859-1
default to a default of UTF-8.

3. Request bodies
A web.xml setting (the same as 2?) could be used to change from the
current ISO-8859-1 default to a default of UTF-8.

Any changes in defaults would need to be reflected in the JSP specification.

Mark


> Le 8/30/2015 2:18 PM, Philippe Marschall a écrit :
>>
>> Hi
>>
>> UTF-8 is the most popular encoding on the web [1], [2], [3]. However
>> configuring a Java EE web application to use UTF-8 has historically
>> not been easy or doable in a portable manner [4]. Are there any plans
>> to change this, for example by adding a <encoding> element to web.xml?
>>
>> [1] http://w3techs.com/technologies/overview/character_encoding/all
>> [2] http://googleblog.blogspot.ch/2010/01/unicode-nearing-50-of-web.html
>> [3] http://www.w3.org/QA/2008/05/utf8-web-growth#c139948
>> [4] http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
>>
>> Cheers
>> Philippe
>