users@jersey.java.net

[Jersey] Re: _at_QueryParam and character encoding

From: Veit Guna <veit.guna_at_gmx.de>
Date: Tue, 01 Nov 2011 09:19:12 +0100

Hi Jakub.

Thanks for looking into this.

Don't get me wrong, IMHO the default behavior of jersey (using UTF-8) is
the right way to do it. But during my tests I wondered, why using
Firefox or soapUI for testing my REST services didn't work out. Special
characters got always mangled. Then I found out, that many tools still
using iso-8859-1, like the two mentioned.

I like the idea of tomcat to allow (e.g. per configuration) "legacy
systems" (build before 2005) to be able to specify their used encoding
via request header. If there's no request header, simply use some
reasonable (configurable) default.

I could also imagine that one would like to override the default jersey
encoding at all e.g. with iso-8859-1 to be compatible with the servlet
spec (like tomcat).

It would also be an idea to be able to disable the "jersey-decoding" at
all and simply use the container's decoded values. In this case one has
the options of the container (like tomcat) for request/query param decoding.

What do you think?

Regards,
Veit





Am 31.10.2011 18:07, schrieb Jakub Podlesak:
> Hi all,
>
> On 25.10.2011 23:29, grave_at_gmx.de wrote:
>> I dug deeper and found out that this is a common problem.
>>
>> Per servlet spec, iso-8859-1 is the default encoding for query
>> parameters.
>> That is what tomcat uses per default. Newer specs (see
>> http://en.wikipedia.org/wiki/Percent-encoding, "Current standard")
>> recommend to use UTF-8 as encoding.
>
> That one links to http://tools.ietf.org/html/rfc3986#section-2.5
> and it indeed says UTF-8 should generally be used.
>
>> Tomcat allows via configuration to let the content-type of the request
>> decide what format
> The content-type header could tell you the entity body character encoding,
> but does not have anything to do with the query parameter encoding.
>
>> the client sends the query parameters in (and mainly the body of
>> course). That also seems to be true for GET requests (see
>> http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q2).
>>
>> I was trying to set these config params on my tomcat, but that didn't
>> work out in combination with jersey. So I took a look at the jersey
>> code and the query param decoding. It is using UTF-8 hardcoded for
>> query param decoding. Also it doesn't seem to take the original, tomcat
>> decoded, params into account. So it's clear why setting the tomcat
>> config params didn't work out for me.
>
> Jersey's com.sun.jersey.api.uri.UriComponent uses UTF-8 encoding by default
> to align with the above RFC 3986 doc.
>
>> I injected the HttpServletRequest via @Context to my service and
>> printed the original query params to the console. And voila, these were
>> decoded "correctly" as set in the tomcat configuration.
>>
>> So my question is, is there something planned to allow this behavior
>> via jersey?
>> (Let the client specify the encoding of the query params via
>> content-type header of the request).
>
> And what if there is no request body? You would need to generate an
> artificial content-type
> header just to be able to parse the query parameters correctly. I do not
> think this is the right
> thing to do.
>
> I need to understand the issue better to come out with a solution.
> Are you saying the client generates proper UTF-8 encoded query parameters,
> which Jersey server can not interpret correctly sitting on top of Tomcat?
>
> Could you please confirm?
>
> Thanks,
>
> ~Jakub
>
>