UTF-8 text is garbled when form is posted as multipart/form-data

Tags: ,



I’m uploading a file to the server. The file upload HTML form has 2 fields:

  1. File name – A HTML text box where the user can give a name in any language.
  2. File upload – A HTMl ‘file’ where user can specify a file from disk to upload.

When the form is submitted, the file contents are received properly. However, when the file name (point 1 above) is read, it is garbled. ASCII characters are displayed properly. When the name is given in some other language (German, French etc.), there are problems.

In the servlet method, the request’s character encoding is set to UTF-8. I even tried doing a filter as mentioned – How can I make this code to submit a UTF-8 form textarea with jQuery/Ajax work? – but it doesn’t seem to work. Only the filename seems to be garbled.

The MySQL table where the file name goes supports UTF-8. I gave random non-English characters & they are stored/displayed properly.

Using Fiddler, I monitored the request & all the POST data is passed correctly. I’m trying to identify how/where the data could get garbled. Any help will be greatly appreciated.

Answer

I had the same problem using Apache commons-fileupload. I did not find out what causes the problems especially because I have the UTF-8 encoding in the following places: 1. HTML meta tag 2. Form accept-charset attribute 3. Tomcat filter on every request that sets the “UTF-8” encoding

-> My solution was to especially convert Strings from ISO-8859-1 (or whatever is the default encoding of your platform) to UTF-8:

new String (s.getBytes ("iso-8859-1"), "UTF-8");

hope that helps

Edit: starting with Java 7 you can also use the following:

new String (s.getBytes (StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);


Source: stackoverflow