users@jersey.java.net

[Jersey] Re: multipart, file upload and broken filenames with umlauts

From: Jakub Podlesak <jakub.podlesak_at_oracle.com>
Date: Tue, 18 Oct 2011 09:59:07 +0200

Hi Mathias,

Confirmed i can reproduce the issue with Firefox.
Still not sure this is an issue on Jersey/Firefox side though,
as i do not see the charset information coming from the Firefox
in the request. I need some more time to figure out
the right way how to fix or workaround this.

~Jakub

On 11.10.2011 18:05, Mathias Fricke wrote:
> hi,
> my users use a formular similar to this to upload text and a file,
> note the ISO-8859-1 encoding!
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN"
> "http://www.w3.org/TR/html4/strict.dtd">
> <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=ISO-88591-"">
> <title>Insert title here</title>
> </head>
> <body>
>
> <form action="http://localhost/Servlet/setData" method="post"
> enctype="multipart/form-data" accept-charset="ISO-88591-" >
> <table>
> <tr><td>aID</td><td><input type="text" name="aID" value="130"
> /></td></tr>
> <tr><td>data</td><td><input type="file" name="data"/></td></tr>
> <tr><td><input type="submit" /> </td></tr>
> </table>
> </form>
>
> </body>
> </html>
>
> on the server the resource processing the request looks like below (i
> stripped it down heavily)
>
> @Path("/setData")
> public class SetAssetResource {
> @POST
> @Consumes("multipart/form-data")
> public Response setAsset(final FormDataMultiPart mimeMultipartData) {
> try {
> final FormDataBodyPart aIdField =
> mimeMultipartData.getField("aID");
> String aID =
> encode(aIdField.getValueAs(byte[].class));//string ok, even w/ umlauts
> ...
> final FormDataBodyPart data =
> mimeMultipartData.getField("data");
> String fileName = null;
> InputStream inputStream = null;
> File file = null;
> if (data != null) {
>
> final FormDataContentDisposition cdp =
> data.getFormDataContentDisposition();
> fileName = cdp.getFileName();//already broken
> fileName = encode(fileName.getBytes(ENCODING));//still
> broken
> ...
> inputStream = data.getValueAs(InputStream.class);
> ...
> final FileOutputStream fileOutputStream = new
> FileOutputStream(file);
> final byte[] buffer = new byte[8192];
> while (inputStream.read(buffer) != -1) {
> fileOutputStream.write(buffer);
> }
> fileOutputStream.flush();
> fileOutputStream.close();
> inputStream.close();
> }
> ...
>
> return Response.status(status).build();
> } catch (final IllegalStateException ex) {
> return Response.status(Status.INTERNAL_SERVER_ERROR).build();
> }
> }
>
> private String encode(final byte[] fieldValue) {
> return new String(new String(fieldValue, ENCODING).getBytes(),
> UTF8);
> }
>
> private final static String UTF8 = "UTF-8";
> private final static String ENCODING = "ISO-8859-1";
> }
>
> the problems start, when umlauts are used in either text fields or the
> file name.
> for the textfields i found a solution (or rather workaround, imo) by
> using getValueAs(byte[].class) and converting that byte[]
>
> new String(new String(fieldValue, ENCODING).getBytes(),
> UTF8);//fieldValue is the byte[]
>
> but this will not work for the fileName of the attached file. here,
> the string is already broken when it comes into my realm.
> checking the string reveals that all umlauts already are broken and
> converted to the same symbol (usually displayed as either a diamond
> shaped thingumabob with a question mark or just a question mark) even
> on byte level (all umlaut chars have the same byte value).
>
> i tried a lot of stuff and searched the web, but there's no solution
> to be found.
>
> simple question: how do i get the file name to be displayed correctly
> in the abvove setup?
>
>
> please, answer to my address to, since i cannot subscribe to this list
> (on java.net there's just some kind of error).
>
> thanks in advance
>
>
>