The follwing line
Files.write(Paths.get("test.txt"), Arrays.asList("ü"), StandardCharsets.UTF_8);
should write a ü
in test.txt encoded in utf-8- At least this is what I expect it to do. But if I open the file in a text editor, the editor shows
ü
and the editor states that it would read the file as utf-8. I even tried two editors and both show the same unexpected result. A HEX-Editor shows
c3 83 c2 bc 0d 0a
The last four bytes are line feed and carriage return, that’s ok, but the first two bytes should have been c3 bc
… since this should be the hex-encoding of ü
in UTF-8 (according to https://www.utf8-zeichentabelle.de/)
The java-file is encoded in UTF-8, confirmed by two editors.
What am I missing? Why is the ü
not encoded in utf-8 even though I explicitly passed the charset to Files.write()
?
Advertisement
Answer
Try instead of “ü” the ASCII u-encoding: “u00FC”. If that suddenly works it means that the editor uses an other encoding (UTF-8) than the javac compiler (Cp1252). By the way: , StandardCharsets.UTF_8 is default.
The java source was saved in the editor as UTF-8, two bytes with high bit set. The java compiler javac compiled with encoding Cp1252 (probably) and turned the two bytes in two chars, which as UTF-8 summed up to 4 bytes.
So the compiler encoding had to be set. In this case also for the test sources.