Should you always explicitly provide encoding in Java when converting between bytes and Strings?

Question

I&#8217;m refactoring some old Java application. It uses HTTP requests to communicate with some external service, so it deals with bytes and Strings. The assumption is that UTF-8 encoding should be used. &#8230;

Accepted Answer

TL;DRYes, you should always make sure the character encoding is defined the way your application needs it, and does not rely on some fact like &#8220;I know that file.encoding is always UTF-8&#8221;. So, go ahead and specify the encoding wherever it&#8217;s not yet done.As already pointed out in comments, something likenew String(lResponseAsString.getBytes(), Config.ENCODING_UTF8);should never be written.The flawed idea behind such a piece of code is that lResponseAsString came from parsing some byte sequence into a String, but using the wrong encoding. So it tries to convert the String back to the original bytes and then parses the bytes again, this time with the correct encoding.First of all, how can the author be sure what encoding was used in creating lResponseAsString? In choosing getBytes() as the inverse conversion, he assumes it was the platform default encoding.Then there are encodings where getBytes() is not guaranteed to reproduce the original byte sequence, e.g. because some byte values are illegal in that encoding.So then, we have a byte array that vaguely might resemble the original byte sequence, and then we hope that parsing that byte sequence as UTF-8 gives a valid result.

Advertisement

Answer