users@servlet-spec.java.net

[servlet-spec users] Re: Easy UTF-8

From: Yannick Majoros <yannick.majoros_at_gmail.com>
Date: Thu, 10 Sep 2015 11:08:24 +0000

Hi,

1. You can accept both (and that's mostly automatic and
configuration-less), or just accept UTF-8 and throw BadRequestException
when browser requests something else. I still can't see the point.
2. That's a browser problem, that you can solve with some configurable
default, and a standard specification for that can indeed help. But that's
just a default when a browser fails to ask clearly (which modern one really
does so nowadays?). Stating that "configuring a Java EE web application to
use UTF-8 has historically not been easy or doable in a portable manner"
just isn't true.

I do agree with your 4th point, that would be cool.

Cheers,

Yannick

Le dim. 6 sept. 2015 à 19:25, Philippe Marschall <kustos_at_gmx.net> a écrit :

>
>
> On 31.08.2015 13:01, Mark Thomas wrote:
> > On 30/08/2015 20:19, Yannick Majoros wrote:
> >> Hi,
> >>
> >> Uh, it's always been quite easy. Why do you think it isn't?
> >>
> >> You're citing Tomcat, which isn't Java EE btw.
> >
> > No, Tomcat isn't a full Java EE implementation but Tomcat implements the
> > Servlet specification and this is the Servlet EG. Pointing out (using
> > one of the many available Servlet implementations) that changing the
> > default character encoding requires container specific configuration and
> > asking for the specification to provide something doesn't seem
> unreasonable.
> >
> > The OP could have made the same point with Glassfish, WebSphere,
> > WebLogic etc.
>
> I chose Tomcat because it has a nice Wiki page that summarizes the issue
> and links to the relevant specs. Also Tomcat serves as servlet
> implementation for several JavaEE implementations.
>
> >> For Servlet, it's up to you. As long as you don't rely on defaults, you
> >> should be fine. JSPs, if you still use them have it quite clear too.
> >
> > And that is the point. If you want the default to be something other
> > than ISO-8859-1 then it has to be changed in multiple places and you
> > almost certainly need to use container specific configuration as well.
> >
> >> Everytime I've seen someone struggle with this, he used a framework that
> >> made dumb assumptions (Struts anyone? That's not Java EE btw). Or the
> >> developer himself was confused, relied on defaults or converted multiple
> >> times...
> >
> > That is a little unfair. While I have also seen those sorts of errors
> > there are also issues (covered in the Tomcat FAQ linked below) with
> > non-spec compliant browser behaviour that contribute to the problem.
> >
> >> I'm curious, what do you want an "encoding" element in web.xml to do?
> >
> > That is a fair question. There are multiple things that you might want
> > to change.
> >
> > 1. URI decoding
> > You can't define this per web application since the URI needs to decoded
> > before it is mapped to the web application. Therefore this has to be a
> > container wide setting which means this pretty much has to use container
> > specific configuration.
> > What we could do is make UTF-8 rather than ISO-8859-1 the default.
> >
> > 2. Response bodies
> > A web.xml setting could be used to change from the current ISO-8859-1
> > default to a default of UTF-8.
> >
> > 3. Request bodies
> > A web.xml setting (the same as 2?) could be used to change from the
> > current ISO-8859-1 default to a default of UTF-8.
>
>
> At minimum 1 because that currently requires container specific
> configuration. I don't think just having UTF-8 is enough as long as
> browsers use ISO-8859-1 for ISO-8859-1 web pages.
>
> Ideally also 2. Adding a filter to webapps for fixing 2 because browsers
> don't send the encoding is doable and portable just little bit annoying.
>
> Personally I can live without 3 however a central place to configure
> everything would be nice.
>
> 4. Make it clear from the spec what the default is so that implementors
> agree what the default is. Ideally cover this by the TCK.
>
> Cheers
> Philippe
>
> --
Yannick Majoros