Skip to content
Advertisement

Using one text file to search through another text file in Java

I’m trying to search through a file (File B) for matching strings from another file (File A). If the string is found in File A, then print the entire line(s) from File B and also update its progress to its corresponding JProgressBar(s) as the lines are being read.

The code below is working fine as expected, but the issue is performance. When dealing with large files, it takes about 15 minutes to scan just 5 thousand lines.

I’m really looking for a way to process large files for example 500K lines.

Please suggest if this can be enhanced to handle large files or which part of my code is causing the slowness.

JavaScript

Advertisement

Answer

Your current solution is lineary iterating through file1, and for each line lineary iterating through file2. This effectively results in a running time of O(F1*F2): The time it takes to run will scale quadratically by the numer of lines (F1 and F2) in your files. Plus file2 is put into memory each time it’s checked for a match, which is very expensive.

A better solution would be to read file2 into memory (Eg. an ArrayList) and sort it:

JavaScript

Then file1 could be iterated as you currently do, and for each line use Binary Search to check if that String exists in file2:

JavaScript

Index would be non-negative if s1 is in file2.

This solution takes linearithmic time instead of quadratic and thus scales much better on larger inputs.

If you would like to improve the time it takes to sort, consider MSD Sort instead of Collections.sort. Only a minor improvement, but hey, it counts.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement