My UTF-8 strings have been converted to ISO-8859-1 strings in the following way:
- Characters 0 to 127 (hex 0x7F) have been left intact (0-9,a-z,A-Z, etc).
- Characters 128 and above have been converted to two ISO-8859-1 characters:
é
becomesé
,Ͷ
becomesͶ
, etc.
Is there a way to undo this conversion, so that é
becomes é
for example?
Advertisement
Answer
Suppose we have a string containing double iso-8859-1 characters, such as é
.
To convert double iso-8859-1 to UTF-8 characters, we can use this constructor of String
. Pass an array of byte
and a CharSet
object. The class java.nio.charset.StandardCharsets
provides constants for various CharSet
objects.
String accentE = new String( "é".getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8 ) ;
which is é
See this code run live at IdeOne.com.