Skip to content
Advertisement

Different results reading file with Files.newBufferedReader() and constructing readers directly

It seems that Files.newBufferedReader() is more strict about UTF-8 than the naive alternative.

If I create a file with a single byte 128—so, not a valid UTF-8 character—it will happily be read if I construct an BufferedReader on an InputStreamReader on the result of Files.newInputStream(), but with Files.newBufferedReader() an exception is thrown.

This code

JavaScript

has this result:

JavaScript

Is this documented? And is it possible to get the lenient behavior with Files.newBufferedReader()?

Advertisement

Answer

The difference is in how the CharsetDecoder used to decode the UTF-8 is constructed in the two cases.

For new InputStreamReader(in, "UTF-8") the decoder is constructed using:

JavaScript

This is explicitly specifying that invalid sequences are just replaced with the standard replacement character.

Files.newBufferedReader(path) uses:

JavaScript

In this case onMalformedInput and onUnmappableCharacter are not being called so you get the default action which is to throw the exception you are seeing.

There does not seem to be a way to change what Files.newBufferedReader does. I didn’t see anything documenting this while looking through the code.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement