Skip to content
Advertisement

Encoding Problem while saving a txt file in utf-8

The follwing line

Files.write(Paths.get("test.txt"), Arrays.asList("ü"), StandardCharsets.UTF_8);

should write a ü in test.txt encoded in utf-8- At least this is what I expect it to do. But if I open the file in a text editor, the editor shows

ü

and the editor states that it would read the file as utf-8. I even tried two editors and both show the same unexpected result. A HEX-Editor shows

c3 83 c2 bc 0d 0a

The last four bytes are line feed and carriage return, that’s ok, but the first two bytes should have been c3 bc… since this should be the hex-encoding of ü in UTF-8 (according to https://www.utf8-zeichentabelle.de/)

The java-file is encoded in UTF-8, confirmed by two editors.

What am I missing? Why is the ü not encoded in utf-8 even though I explicitly passed the charset to Files.write()?

Advertisement

Answer

Try instead of “ü” the ASCII u-encoding: “u00FC”. If that suddenly works it means that the editor uses an other encoding (UTF-8) than the javac compiler (Cp1252). By the way: , StandardCharsets.UTF_8 is default.

The java source was saved in the editor as UTF-8, two bytes with high bit set. The java compiler javac compiled with encoding Cp1252 (probably) and turned the two bytes in two chars, which as UTF-8 summed up to 4 bytes.

So the compiler encoding had to be set. In this case also for the test sources.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement