Skip to content

Why does this code get stuck in the inner while loop? [closed]

public static void Hash() throws IOException
{
    int i = 0;
    for (var k : allFiles.keySet())
    {
        for (var file : allFiles.get(k))
        {
            FileInputStream fis = new FileInputStream(file.getAbsolutePath());
            int b = fis.read();
            int xor = 0;
            while (b != -1)
            {
                xor ^= b;
                b = fis.read();
            }
            fis.close();
            System.out.println(i++);
            System.out.println("End elaboration file " + file.getAbsolutePath());
            long xored = file.length() ^ xor;
            if (allByHash.get(xored) != null)
            {
                allByHash.get(xored).add(file);
            } else
            {
                allByHash.put(xored, new LinkedList<File>());
                allByHash.get(xored).add(file);
            }
        }
    }
}

It stucks at i = 122; what is wrong? how can it keep looping in the while?

The aim is to read all files byte by byte and compute an hash of their size and content to compare them in search of duplicates.

The method blocks in the while and i can’t understand why.

I can’t debug the reading of the file byte by byte :p

Answer

File 122 is either [A] quite large, or [B] literally endless (something like /dev/null is literally endless), or [C] a special file that can block (such as /dev/random, which can block and is endless).

Calling read() on a FileInputStream is literally 60,000 times slower than it needs to be. Modern disks are simply incapable of reading a single byte at a time; instead, they read blocks at a time. However, FileInputStream cannot buffer, so when ask for ‘1 byte’ (which is what read() does), it fetches an entire block, tosses all but 1 byte in the bin, and returns it. Then you ask for another byte and we do this all over again. Thus, you’re reading that file ‘blockSize’ times too many.

Either program this using a buffer (The .read(byteArray) variant), or wrap your FileInputStream in a BufferedInputStream first, which adds some memory to buffer those reads out.

The upshot is: If the 122nd file is quite large, it’ll look like it’s stuck, but it’s just taking that long, as it’s running 60k times slower than it needs to.