users@jersey.java.net

[Jersey] Re: _at_QueryParam and character encoding

From: Jakub Podlesak <jakub.podlesak_at_oracle.com>
Date: Mon, 31 Oct 2011 18:07:38 +0100

Hi all,

On 25.10.2011 23:29, grave_at_gmx.de wrote:
> I dug deeper and found out that this is a common problem.
>
> Per servlet spec, iso-8859-1 is the default encoding for query
> parameters.
> That is what tomcat uses per default. Newer specs (see
> http://en.wikipedia.org/wiki/Percent-encoding, "Current standard")
> recommend to use UTF-8 as encoding.

That one links to http://tools.ietf.org/html/rfc3986#section-2.5
and it indeed says UTF-8 should generally be used.

> Tomcat allows via configuration to let the content-type of the request
> decide what format
The content-type header could tell you the entity body character encoding,
but does not have anything to do with the query parameter encoding.

> the client sends the query parameters in (and mainly the body of
> course). That also seems to be true for GET requests (see
> http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q2).
>
> I was trying to set these config params on my tomcat, but that didn't
> work out in combination with jersey. So I took a look at the jersey
> code and the query param decoding. It is using UTF-8 hardcoded for
> query param decoding. Also it doesn't seem to take the original, tomcat
> decoded, params into account. So it's clear why setting the tomcat
> config params didn't work out for me.

Jersey's com.sun.jersey.api.uri.UriComponent uses UTF-8 encoding by default
to align with the above RFC 3986 doc.

> I injected the HttpServletRequest via @Context to my service and
> printed the original query params to the console. And voila, these were
> decoded "correctly" as set in the tomcat configuration.
>
> So my question is, is there something planned to allow this behavior
> via jersey?
> (Let the client specify the encoding of the query params via
> content-type header of the request).

And what if there is no request body? You would need to generate an
artificial content-type
header just to be able to parse the query parameters correctly. I do not
think this is the right
thing to do.

I need to understand the issue better to come out with a solution.
Are you saying the client generates proper UTF-8 encoded query parameters,
which Jersey server can not interpret correctly sitting on top of Tomcat?

Could you please confirm?

Thanks,

~Jakub