Skip to content
Advertisement

Java: Reading from getResourceAsStream gets too many bytes

I’m trying to read a binary file, using getResourceAsStream. The problem is I get too many bytes back. The file is 56374 bytes long, according to ls, but when I read it in my code, I consistently get 85194 bytes.

InputStream fileData = checkNotNull(MyClass.class.getResourceAsStream(path));
byte [] b = IOUtils.toByteArray(fileData);
int count = b.length;

I get the same result with similar code:

InputStream fileData = checkNotNull(MyClass.class.getResourceAsStream(path));
byte [] b = new byte[1000*1000];
int count  = fileData.read(b);

If I run the code without the resource, everything is fine, I get the correct number of bytes.

    FileInputStream fis = new FileInputStream(path);
    byte [] b = new byte[1000*1000];
    int count  = fis.read(b);

The first bytes of the data I read match. Checking the output, the first byte that doesn’t match is “CO”, which comes out as “ef bf bd”.

Maybe somehow it’s trying to convert to/from UTF-8? Everything should be binary here. There is no text involved.

Any help appreciated.

Edit: I’m pretty sure I’m reading the correct file. If I rename the file, the read fails. Change it back, it works. I changed the resource name in intellij, and it refactored and changed the name in the code, which still worked.

Edit2: I was wrong. I’m not looking at the correct file. I traced into getResourceAsStream. Our build system copies the file to a build output directory, and runs from there.. This destination file is the wrong size, so it appears the copy is doing some damage.

Note that it would copy the file again, any time I changed the name, which is why I thought I had the right file.

Advertisement

Answer

I suspect that you are actually reading a different version of the file when you read it as a resource. The JVM reads resources as located by the classloader. So when you resolve the same path string as a resource and as a file, there is a good chance they are resolving to different things.

I doubt that the root issue is Unicode or UTF-8. Your examples show that you are reading the state using InputStream. That approach is encoding agnostic … and will give you the raw bytes from the file(s). A regular InputStream doesn’t try to decode the bytes it reads.

Having said that, it is definitely significant that the bytes you are reading are different. But that is also consistent with simply reading different files.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement