Skip to content

Tag: utf-8

Encoding Problem while saving a txt file in utf-8

The follwing line should write a ü in test.txt encoded in utf-8- At least this is what I expect it to do. But if I open the file in a text editor, the editor shows and the editor states that it would read the file as utf-8. I even tried two editors and both show the same unexpected result. A

Java: Reading from getResourceAsStream gets too many bytes

I’m trying to read a binary file, using getResourceAsStream. The problem is I get too many bytes back. The file is 56374 bytes long, according to ls, but when I read it in my code, I consistently get 85194 bytes. I get the same result with similar code: If I run the code without the resource, everything is fine, I

Different results reading file with Files.newBufferedReader() and constructing readers directly

It seems that Files.newBufferedReader() is more strict about UTF-8 than the naive alternative. If I create a file with a single byte 128—so, not a valid UTF-8 character—it will happily be read if I construct an BufferedReader on an InputStreamReader on the result of Files.newInputStream(), but with Files.newBufferedReader() an exception is thrown. This code has this result: Is this documented?

Opening CSV with UTF-8 BOM via Excel

I create csv file with data by the means of java. And I faced the following well-known issue: the letters in Portuguese were displayed by the wrong way in Excel (when opening by double click). I solved this by UTF-16LE+BOM, but excel started to recognize tabs as columns separators instead of commas. So I looked up for another solution and

Byte order mark screws up file reading in Java

I’m trying to read CSV files using Java. Some of the files may have a byte order mark in the beginning, but not all. When present, the byte order gets read along with the rest of the first line, thus causing problems with string compares. Is there an easy way to skip the byte order mark when it is present?