Skip to content
Advertisement

Reliance on default encoding, what should I use and why?

FindBugs reports a bug:

Reliance on default encoding Found a call to a method which will perform a byte to String (or String to byte) conversion, and will assume that the default platform encoding is suitable. This will cause the application behaviour to vary between platforms. Use an alternative API and specify a charset name or Charset object explicitly.

I used FileReader like this (just a piece of code):

public ArrayList<String> getValuesFromFile(File file){
    String line;
    StringTokenizer token;
    ArrayList<String> list = null;
    BufferedReader br = null;
    try {
        br = new BufferedReader(new FileReader(file));
        list = new ArrayList<String>();
        while ((line = br.readLine())!=null){
            token = new StringTokenizer(line);
            token.nextToken();
            list.add(token.nextToken());
    ...

To correct the bug I need to change

br = new BufferedReader(new FileReader(file));

to

br = new BufferedReader(new InputStreamReader(new FileInputStream(file), Charset.defaultCharset()));

And when I use PrintWriter the same error occurred. So now I have a question. When I can (should) use FileReader and PrintWriter, if it’s not good practice rely on default encoding? And the second question is to properly use Charset.defaultCharset ()? I decided use this method for automatically defining charset of the user’s OS.

Advertisement

Answer

If the file is under the control of your application, and if you want the file to be encoded in the platform’s default encoding, then you can use the default platform encoding. Specifying it explicitely makes it clearer, for you and future maintainers, that this is your intention. This would be a reasonable default for a text editor, for example, which would then write files that any other editor on this platform would then be able to read.

If, on the other hand, you want to make sure that any possible character can be written in your file, you should use a universal encoding like UTF8.

And if the file comes from an external application, or is supposed to be compatible with an external application, then you should use the encoding that this external application expects.

What you must realize is that if you write a file like you’re doing on a machine, and read it as you’re doing on another machine, which doesn’t have the same default encoding, you won’t necessarily be able to read what you have written. Using a specific encoding, to write and read, like UTF8 makes sure the file will always be the same, whatever platform is used when writing the file.

Advertisement