My UTF-8 strings have been converted to ISO-8859-1 strings in the following way:
- Characters 0 to 127 (hex 0x7F) have been left intact (0-9,a-z,A-Z, etc).
- Characters 128 and above have been converted to two ISO-8859-1 characters:
ébecomesé,ͶbecomesͶ, etc.
Is there a way to undo this conversion, so that é becomes é for example?
Advertisement
Answer
Suppose we have a string containing double iso-8859-1 characters, such as é.
To convert double iso-8859-1 to UTF-8 characters, we can use this constructor of String. Pass an array of byte and a CharSet object. The class java.nio.charset.StandardCharsets provides constants for various CharSet objects.
String accentE =
new String(
"é".getBytes(StandardCharsets.ISO_8859_1),
StandardCharsets.UTF_8
)
;
which is é
See this code run live at IdeOne.com.