Skip to content
Advertisement

Opening CSV with UTF-8 BOM via Excel

I create csv file with data by the means of java. And I faced the following well-known issue: the letters in Portuguese were displayed by the wrong way in Excel (when opening by double click).

I solved this by UTF-16LE+BOM, but excel started to recognize tabs as columns separators instead of commas.

So I looked up for another solution and saw many posts, in which people say that just adding UTF-8 BOM and writing file in UTF-8 will do the job for Excel 2007 and later. I tried the simpliest sample on my work computer and it failed. But when I tried this at my home computer it worked like a charm.

Both computers have the same versions of java installed and operating system Windows 7. I am confused. Can anyone tell what can cause such a strange behaviour?

You can see my simpliest sample below:

String filename = "D:/check/test_with_bom.csv";
        FileOutputStream fos = new FileOutputStream(filename);
        byte[] bom = new byte[] { (byte)0xEF, (byte)0xBB, (byte)0xBF }; 
        fos.write(bom);
        OutputStreamWriter osw = new OutputStreamWriter(fos , "UTF-8");
        PrintWriter printWriter = new PrintWriter(osw);

        printWriter.print("Hello,Olá,ão,ção");
        printWriter.close();

Advertisement

Answer

You should be aware that Excel does not “open” csv files. It converts them to an Excel file on the fly, using defaults. These defaults can be different depending on your regional settings. Because of that, it’s never a good idea to let Excel open csv files using the defaults, since you’ll never know for sure what you end up with.

A safer method is to use the ‘import from text’ method, and explicitly specify the delimiter, encoding, etc… Yet, be aware that ‘save as csv’ in Excel is an even worse idea, since it does not allow you to specify the encoding, delimiter, or any other detail. Access does.

On American Windows versions of Excel, the default column separator is a comma. On European Windows versions the comma is reserved for the Decimal Symbol and to avoid conflicts, a semicolon is used by default as column separator.

If you -really- -really- -have- to use CSV, you can consider adding the “sep=,” indicator at the top of your csv file. yet, be aware that this will probably cause problems in other applications.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement