users@servlet-spec.java.net

[servlet-spec users] Re: Easy UTF-8

From: Yannick Majoros <yannick.majoros_at_gmail.com>
Date: Mon, 31 Aug 2015 12:23:30 +0000

Hi Mark,

Maybe there is some misunderstanding here.

There is a big difference between what you're saying, and what the OP is
asking.

The OP is talking about "using UTF-8".

I'm saying this is quite easy, and shouldn't be defined in web.xml. You
could even accept multiple encodings in an application, with content
negociation, per resource.

You're using the word "default" in every single paragraph of your answer. I
therefore understand that you're talking about defaults, which are
container-specific and can surely be improved.

Personally, I insist that you shouldn't rely on defaults anyway, so I don't
really care.

From as far as I can tell with a very quick check, the url part of the
input has already to be utf-8 (
http://tools.ietf.org/html/rfc3987#section-6.4 ), so encoding defaults
shouldn't have any influence on that.

Still puzzled by what this should solve, besides a default that shouldn't
be relied upon in most cases.

Cheers,

Yannick

Le lun. 31 août 2015 à 13:02, Mark Thomas <markt_at_apache.org> a écrit :

> On 30/08/2015 20:19, Yannick Majoros wrote:
> > Hi,
> >
> > Uh, it's always been quite easy. Why do you think it isn't?
> >
> > You're citing Tomcat, which isn't Java EE btw.
>
> No, Tomcat isn't a full Java EE implementation but Tomcat implements the
> Servlet specification and this is the Servlet EG. Pointing out (using
> one of the many available Servlet implementations) that changing the
> default character encoding requires container specific configuration and
> asking for the specification to provide something doesn't seem
> unreasonable.
>
> The OP could have made the same point with Glassfish, WebSphere,
> WebLogic etc.
>
> > For Servlet, it's up to you. As long as you don't rely on defaults, you
> > should be fine. JSPs, if you still use them have it quite clear too.
>
> And that is the point. If you want the default to be something other
> than ISO-8859-1 then it has to be changed in multiple places and you
> almost certainly need to use container specific configuration as well.
>
> > Everytime I've seen someone struggle with this, he used a framework that
> > made dumb assumptions (Struts anyone? That's not Java EE btw). Or the
> > developer himself was confused, relied on defaults or converted multiple
> > times...
>
> That is a little unfair. While I have also seen those sorts of errors
> there are also issues (covered in the Tomcat FAQ linked below) with
> non-spec compliant browser behaviour that contribute to the problem.
>
> > I'm curious, what do you want an "encoding" element in web.xml to do?
>
> That is a fair question. There are multiple things that you might want
> to change.
>
> 1. URI decoding
> You can't define this per web application since the URI needs to decoded
> before it is mapped to the web application. Therefore this has to be a
> container wide setting which means this pretty much has to use container
> specific configuration.
> What we could do is make UTF-8 rather than ISO-8859-1 the default.
>
> 2. Response bodies
> A web.xml setting could be used to change from the current ISO-8859-1
> default to a default of UTF-8.
>
> 3. Request bodies
> A web.xml setting (the same as 2?) could be used to change from the
> current ISO-8859-1 default to a default of UTF-8.
>
> Any changes in defaults would need to be reflected in the JSP
> specification.
>
> Mark
>
>
> > Le 8/30/2015 2:18 PM, Philippe Marschall a écrit :
> >>
> >> Hi
> >>
> >> UTF-8 is the most popular encoding on the web [1], [2], [3]. However
> >> configuring a Java EE web application to use UTF-8 has historically
> >> not been easy or doable in a portable manner [4]. Are there any plans
> >> to change this, for example by adding a <encoding> element to web.xml?
> >>
> >> [1] http://w3techs.com/technologies/overview/character_encoding/all
> >> [2]
> http://googleblog.blogspot.ch/2010/01/unicode-nearing-50-of-web.html
> >> [3] http://www.w3.org/QA/2008/05/utf8-web-growth#c139948
> >> [4] http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
> >>
> >> Cheers
> >> Philippe
> >
>
> --
Yannick Majoros