Skip to content
Advertisement

Hashing runs out of memory, and getting slower and slower over time

I have a GUI desktop application which generates different types of hash (example MD5) for files and directories. Recently, when I was testing with a 1GB test file I recognized that it becomes slower and slower over time. At the first hashing, it takes about 2 seconds for a 1GB file, however later, for the exactly same file, it takes about 76 seconds.

To demonstrate the problem, I have created a sample code that everyone can try (for repeatability). It has 2 key steps (1) generates a byte array for the file, (2) generates the hash for the byte array. (In the real program there are several switches and if-else statements, for example to decide if it is a file or directory…etc., and lot of javaFX GUI elements involved…)

I’ll show that even this simplified code becomes 8 times slower by repeating it 5 times! As I read multiple forums it is probably the reason of memory leaking or too much memory consumption…or something similar. What I want is, I’d like to empty the memory between each cycle, so hashing would take only the time as the first time (2 seconds).

The mentioned sample code is the following:

package main;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import org.springframework.util.StopWatch;

public class Main {

    public static void main(String[] args) throws NoSuchAlgorithmException {

        // 1GB Test file downloaded from here:
        // https://testfiledownload.com/

        StopWatch sw = new StopWatch();
        // Hash the file 5 times and measure the time for each hashing
        for (int i = 0; i < 5; i++) {
            sw.start();
                String hash = encrypt(inputparser("C:\Users\Thend\Downloads\1GB.bin"));
            sw.stop();
            // Print execution time at each cycle
            System.out.println("Execution time: "+String.format("%.5f", sw.getTotalTimeMillis() / 1000.0f)+"sec"+" Hash: "+hash);
        }
    }

    public static byte[] inputparser(String path){
        File f = new File(path);
        byte[] bytes = new byte[(int) f.length()];
        FileInputStream fis = null;
        try {
            fis = new FileInputStream(f);
            // read file into bytes[]
            fis.read(bytes);
            if (fis != null) {
                fis.close();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return bytes;
    }

    public static String encrypt(byte[] bytes) throws NoSuchAlgorithmException {
        MessageDigest md = MessageDigest.getInstance("MD5");
        StringBuilder sb = new StringBuilder();

        md.reset();
        md.update(bytes);

        byte[] hashed_bytes = md.digest();

        // Convert bytes[] (in decimal format) to hexadecimal
        for (int i = 0; i < hashed_bytes.length; i++) {
            sb.append(Integer.toString((hashed_bytes[i] & 0xff) + 0x100, 16).substring(1));
        }

        // Return hashed String in hex format
        String hashedByteArray = sb.toString();
        return hashedByteArray;
    }
}

The console output, where you can see the increasing time:

"C:Program FilesJavajdk-18.0.2binjava.exe" "-javaagent:C:Program FilesJetBrainsIntelliJ IDEA Community Edition 2022.2libidea_rt.jar=54323:C:Program FilesJetBrainsIntelliJ IDEA Community Edition 2022.2bin" -Dfile.encoding=UTF-8 -Dsun.stdout.encoding=UTF-8 -Dsun.stderr.encoding=UTF-8 -classpath C:UsersThendintellij-workspaceHashTimeTestoutproductionHashTimeTest;C:UsersThendDownloadsspringframework-5.1.0.jar main.Main
Execution time: 2,04100sec Hash: e5c834fbdaa6bfd8eac5eb9404eefdd4
Execution time: 3,70900sec Hash: e5c834fbdaa6bfd8eac5eb9404eefdd4
Execution time: 5,42100sec Hash: e5c834fbdaa6bfd8eac5eb9404eefdd4
Execution time: 7,09600sec Hash: e5c834fbdaa6bfd8eac5eb9404eefdd4
Execution time: 8,75500sec Hash: e5c834fbdaa6bfd8eac5eb9404eefdd4

Process finished with exit code 0

Advertisement

Answer

Probably your problem is that you store a reference to the loaded file somewhere.

A better approach to do an operation like a hash calculation over a big file is to not load everything into memory but to load it only piece by piece:

public static String encrypt(String path) throws NoSuchAlgorithmException {
    File f = new File(path);
    byte[] bytes = new byte[4096];
    MessageDigest md = MessageDigest.getInstance("MD5");
    try (FileInputStream fis = new FileInputStream(f)) {
        while (true) {
            int len = fis.read(bytes);
            if (len == -1) {
                break;
            }
            md.update(bytes, 0, len);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
    byte[] hashed_bytes = md.digest();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < hashed_bytes.length; i++) {
        sb.append(String.format("%02x", hashed_bytes[i]));
    }
    return sb.toString();
}
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement