Hi Paul,
I got to the bottom of this by trying to unmarshal the string in three
different ways. As I already mentioned the first way was just to call
FormDataBodyPart.getValue().toString(). This produced the improperly decoded
String.
Then I tried two other ways, both of which correctly unmarshalled the bytes
from the POST. As supporting information, here is the CURL line I was
testing with, and an excerpt from the CURL trace, showing the proper bytes
being posted.
curl --trace traced -F line=てすと
http://localhost:8080/nextdb/rest/geoff/testchars/lines/rowid/PK/1
The 9 bytes highlighted below are the three japanese characters.
=> Send data, 148 bytes (0x94)
0000: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d ----------------
0010: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 63 61 --------------ca
0020: 33 32 31 65 31 30 39 66 36 37 0d 0a 43 6f 6e 74 321e109f67..Cont
0030: 65 6e 74 2d 44 69 73 70 6f 73 69 74 69 6f 6e 3a ent-Disposition:
0040: 20 66 6f 72 6d 2d 64 61 74 61 3b 20 6e 61 6d 65 form-data; name
0050: 3d 22 6c 69 6e 65 22 0d 0a 0d 0a e3 81 a6 e3 81 ="line".........
0060: 99 e3 81 a8 0d 0a 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d ......----------
0070: 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d 2d ----------------
0080: 2d 2d 2d 2d 63 61 33 32 31 65 31 30 39 66 36 37 ----ca321e109f67
0090: 2d 2d 0d 0a
Both of the following two methods will properly unmarshal the correct
String:
method 1: use InputStream to get raw bytes
InputStream is =
theFormDataBodyPart.getValueAs(InputStream.class);
try {
byte[] bytes = Util.readInputStream(is, 1024 * 1024,
1024 * 1024 * 1024);
log.debug("this many bytes: " + bytes.length);
for(byte b:bytes){
log.debug(Integer.toHexString(0x00FF&b));
}
String s = new String(bytes, "UTF-8");
log.debug(s);
return s;
} catch (IOException ex) {
throw new RuntimeException(ex.getMessage(), ex);
}
Method 2) use theFormDataBodyPart.getValueAs(String.class)
Cheers,
geoff
--
http://nextdb.net - RESTful Relational Database
http://www.nextdb.net/wiki/en/REST