Tag: utf-8

Byte order mark screws up file reading in Java

I’m trying to read CSV files using Java. Some of the files may have a byte order mark in the beginning, but not all. When present, the byte order gets read along with the rest of the first line, thus causing problems with string compares. Is there an easy way to skip the byte order mark when it is present?

Java UTF-8 strange behaviour

java utf-8

I am trying to decode some UTF-8 strings in Java. These strings contain some combining unicode characters, such as CC 88 (combining diaresis). The character sequence seems ok, according to http://www.fileformat.info/info/unicode/char/0308/index.htm But the output after conversion to String is invalid. Any idea ? Output: {{69cc88}} >i? Answer The console which you’re outputting to (e.g. windows) may not support unicode, and