The follwing line should write a ü in test.txt encoded in utf-8- At least this is what I expect it to do. But if I open the file in a text editor, the editor shows and the editor states that it would read the file as utf-8. I even tried two editors and both show the same unexpected result. A
Tag: utf-8
Java: Reading from getResourceAsStream gets too many bytes
I’m trying to read a binary file, using getResourceAsStream. The problem is I get too many bytes back. The file is 56374 bytes long, according to ls, but when I read it in my code, I consistently get 85194 bytes. I get the same result with similar code: If I run the code without the resource, everything is fine, I
How can I combine *u” and the UTF-8 code for console output? (Java)
I would like to combine a “u” with a String that contains a Hex-Code so that I can print out a unicode character in the console. I’ve tried something like this, but the console only prints …
Java: how to undo conversion from UTF-8 to ISO-8859-1 [closed]
My UTF-8 strings have been converted to ISO-8859-1 strings in the following way: Characters 0 to 127 (hex 0x7F) have been left intact (0-9,a-z,A-Z, etc). Characters 128 and above have been converted …
Different results reading file with Files.newBufferedReader() and constructing readers directly
It seems that Files.newBufferedReader() is more strict about UTF-8 than the naive alternative. If I create a file with a single byte 128—so, not a valid UTF-8 character—it will happily be read if I construct an BufferedReader on an InputStreamReader on the result of Files.newInputStream(), but with Files.newBufferedReader() an exception is thrown. This code has this result: Is this documented?
Opening CSV with UTF-8 BOM via Excel
I create csv file with data by the means of java. And I faced the following well-known issue: the letters in Portuguese were displayed by the wrong way in Excel (when opening by double click). I solved this by UTF-16LE+BOM, but excel started to recognize tabs as columns separators instead of commas. So I looked up for another solution and
OrientDB having trouble with Unicode, Turkish, and enums
I am using a lib which has an enum type with consts like these; While I am debugging in Eclipse, I got an error: As I am using a Turkish system, there is a problem on working i>İ but as this is an enum const, even though I put every attributes as UTF-8, nothing could get that STRING is what
Byte order mark screws up file reading in Java
I’m trying to read CSV files using Java. Some of the files may have a byte order mark in the beginning, but not all. When present, the byte order gets read along with the rest of the first line, thus causing problems with string compares. Is there an easy way to skip the byte order mark when it is present?