users@jersey.java.net

FormDataMultiPart UTF-8 issue

From: geoffrey hendrey <geoff_at_nextdb.net>
Date: Tue, 19 Jan 2010 23:07:57 -0800

I am using FormDataMultiPart to post a String. The string consists of 3
kanji characters for the word てすと (te-su-to)

However, when I call
FormDataMultiPart.get("theWord").getValue().toString().length(), the result
is 9 when I expect the result to be 3 (because there are three characters).

The UTF-8 byte sequence for these 3 characters is 9 bytes. The observed
behavior (9) is explainable if Jersey is marshalling each of the bytes into
a character, instead of properly interpreting the 3-byte UTF-8 sequences as
characters.

Anyone know how to properly receive UTF-8 characters in Jersey, from
FormDataMultiPart?

-geoff

-- 
http://nextdb.net - RESTful Relational Database
http://www.nextdb.net/wiki/en/REST