Java: how to undo conversion from UTF-8 to ISO-8859-1 [closed]

Tags: , ,



My UTF-8 strings have been converted to ISO-8859-1 strings in the following way:

  • Characters 0 to 127 (hex 0x7F) have been left intact (0-9,a-z,A-Z, etc).
  • Characters 128 and above have been converted to two ISO-8859-1 characters: é becomes é, Ͷ becomes Ͷ, etc.

Is there a way to undo this conversion, so that é becomes é for example?

Answer

Suppose we have a string containing double iso-8859-1 characters, such as é.

To convert double iso-8859-1 to UTF-8 characters, we can use this constructor of String. Pass an array of byte and a CharSet object. The class java.nio.charset.StandardCharsets provides constants for various CharSet objects.

String accentE = 
        new String(
            "é".getBytes(StandardCharsets.ISO_8859_1), 
            StandardCharsets.UTF_8
        )
;

which is é

See this code run live at IdeOne.com.



Source: stackoverflow