On 31/08/2015 13:23, Yannick Majoros wrote:
> Hi Mark,
>
> Maybe there is some misunderstanding here.
>
> There is a big difference between what you're saying, and what the OP is
> asking.
I disagree. I think you and I have interpreted the OP's request differently.
> The OP is talking about "using UTF-8".
>
> I'm saying this is quite easy, and shouldn't be defined in web.xml. You
> could even accept multiple encodings in an application, with content
> negociation, per resource.
>
> You're using the word "default" in every single paragraph of your
> answer. I therefore understand that you're talking about defaults, which
> are container-specific and can surely be improved.
No, these are not container specific. The defaults are mandated by the
Servlet spec and are currently ISO-8859-1. There are container specific
mechanisms for changing these defaults.
> Personally, I insist that you shouldn't rely on defaults anyway, so I
> don't really care.
I think it is perfectly reasonable for an application to depend on
specification defined defaults.
I do think, as a minimum, we should change the specification to use
UTF-8 by default rather than ISO-8859-1.
> From as far as I can tell with a very quick check, the url part of the
> input has already to be utf-8
> ( http://tools.ietf.org/html/rfc3987#section-6.4 ), so encoding defaults
> shouldn't have any influence on that.
I don't believe that that specification applies to Servlet containers.
I'd be more than happy if it did since I'm in favour of UTF-8 by default.
Note that from Tomcat 8, Tomcat does use UTF-8 by default unless the
'strict adherence to the servlet spec' option is enabled in which case
it uses ISO-8859-1.
> Still puzzled by what this should solve, besides a default that
> shouldn't be relied upon in most cases.
We appear to disagree on whether or not an application depending on a
specification defined default is a reasonable thing to do.
My view is the changing the default from ISO-8859-1 to UTF-8 throughout
the Servlet spec would be a beneficial change to users and should be a
compatible change for any application relying on a default of ISO-8859-1.
I can see some merit in providing specification defined options for
changing the defaults but I don't view that as important or useful as
simply changing the current defaults to UTF-8.
Cheers,
Mark
>
> Cheers,
>
> Yannick
>
> Le lun. 31 août 2015 à 13:02, Mark Thomas <markt_at_apache.org
> <mailto:markt_at_apache.org>> a écrit :
>
> On 30/08/2015 20:19, Yannick Majoros wrote:
> > Hi,
> >
> > Uh, it's always been quite easy. Why do you think it isn't?
> >
> > You're citing Tomcat, which isn't Java EE btw.
>
> No, Tomcat isn't a full Java EE implementation but Tomcat implements the
> Servlet specification and this is the Servlet EG. Pointing out (using
> one of the many available Servlet implementations) that changing the
> default character encoding requires container specific configuration and
> asking for the specification to provide something doesn't seem
> unreasonable.
>
> The OP could have made the same point with Glassfish, WebSphere,
> WebLogic etc.
>
> > For Servlet, it's up to you. As long as you don't rely on
> defaults, you
> > should be fine. JSPs, if you still use them have it quite clear too.
>
> And that is the point. If you want the default to be something other
> than ISO-8859-1 then it has to be changed in multiple places and you
> almost certainly need to use container specific configuration as well.
>
> > Everytime I've seen someone struggle with this, he used a
> framework that
> > made dumb assumptions (Struts anyone? That's not Java EE btw). Or the
> > developer himself was confused, relied on defaults or converted
> multiple
> > times...
>
> That is a little unfair. While I have also seen those sorts of errors
> there are also issues (covered in the Tomcat FAQ linked below) with
> non-spec compliant browser behaviour that contribute to the problem.
>
> > I'm curious, what do you want an "encoding" element in web.xml to do?
>
> That is a fair question. There are multiple things that you might want
> to change.
>
> 1. URI decoding
> You can't define this per web application since the URI needs to decoded
> before it is mapped to the web application. Therefore this has to be a
> container wide setting which means this pretty much has to use container
> specific configuration.
> What we could do is make UTF-8 rather than ISO-8859-1 the default.
>
> 2. Response bodies
> A web.xml setting could be used to change from the current ISO-8859-1
> default to a default of UTF-8.
>
> 3. Request bodies
> A web.xml setting (the same as 2?) could be used to change from the
> current ISO-8859-1 default to a default of UTF-8.
>
> Any changes in defaults would need to be reflected in the JSP
> specification.
>
> Mark
>
>
> > Le 8/30/2015 2:18 PM, Philippe Marschall a écrit :
> >>
> >> Hi
> >>
> >> UTF-8 is the most popular encoding on the web [1], [2], [3]. However
> >> configuring a Java EE web application to use UTF-8 has historically
> >> not been easy or doable in a portable manner [4]. Are there any plans
> >> to change this, for example by adding a <encoding> element to
> web.xml?
> >>
> >> [1] http://w3techs.com/technologies/overview/character_encoding/all
> >> [2]
> http://googleblog.blogspot.ch/2010/01/unicode-nearing-50-of-web.html
> >> [3] http://www.w3.org/QA/2008/05/utf8-web-growth#c139948
> >> [4] http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
> >>
> >> Cheers
> >> Philippe
> >
>
> --
> Yannick Majoros