users@jersey.java.net

more info on UTF-8 FormDataBodyPart issue

From: geoffrey hendrey <geoff.hendrey_at_gmail.com>
Date: Thu, 21 Jan 2010 21:46:02 -0800

Hi Paul,

I got to the bottom of this by trying to unmarshal the string in three
different ways. As I already mentioned the first way was just to call
FormDataBodyPart.getValue().toString(). This produced the improperly decoded
String.

Then I tried two other ways, both of which correctly unmarshalled the bytes
from the POST. As supporting information, here is the CURL line I was
testing with, and an excerpt from the CURL trace, showing the proper bytes
being posted.

curl --trace traced -F line=てすと
http://localhost:8080/nextdb/rest/geoff/testchars/lines/rowid/PK/1

The 9 bytes highlighted below are the three japanese characters.

=> Send data, 148 bytes (0x94)
0000: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d ----------------
0010: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 63 61 --------------ca
0020: 33 32 31 65 31 30 39 66 36 37 0d 0a 43 6f 6e 74 321e109f67..Cont
0030: 65 6e 74 2d 44 69 73 70 6f 73 69 74 69 6f 6e 3a ent-Disposition:
0040: 20 66 6f 72 6d 2d 64 61 74 61 3b 20 6e 61 6d 65 form-data; name
0050: 3d 22 6c 69 6e 65 22 0d 0a 0d 0a e3 81 a6 e3 81 ="line".........
0060: 99 e3 81 a8 0d 0a 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d ......----------
0070: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d ----------------
0080: 2d 2d 2d 2d 63 61 33 32 31 65 31 30 39 66 36 37 ----ca321e109f67
0090: 2d 2d 0d 0a

Both of the following two methods will properly unmarshal the correct
String:

method 1: use InputStream to get raw bytes

                    InputStream is =
theFormDataBodyPart.getValueAs(InputStream.class);
                    try {
                        byte[] bytes = Util.readInputStream(is, 1024 * 1024,
1024 * 1024 * 1024);
                        log.debug("this many bytes: " + bytes.length);
                        for(byte b:bytes){
                            log.debug(Integer.toHexString(0x00FF&b));
                        }
                        String s = new String(bytes, "UTF-8");
                        log.debug(s);
                        return s;
                    } catch (IOException ex) {
                        throw new RuntimeException(ex.getMessage(), ex);
                    }

Method 2) use theFormDataBodyPart.getValueAs(String.class)

Cheers,
geoff



-- 
http://nextdb.net - RESTful Relational Database
http://www.nextdb.net/wiki/en/REST