Skip to content
Advertisement

How to use iterator pattern in java to load file into hashmap in batches

I have a large file containing two million lines . I’m looking to traverse through each line of the file and, process it into a key value pair and store it into a hashmap to make comparisons later on. However, I do not want to have a hashmap with 2 million key value pairs in the interest of space complexity. Instead , I would like to iterate through N lines of the files and load their key value pairs in the hashmap , make comparisons and then load the next N lines into the hashmap and so on.

An example of the use case :

File.txt:

JavaScript

Assuming N=3 as the size of my hashmap, at the first iteration my hashmap would store key value pairs for the first three lines of the file i.e

JavaScript

After making comparisons on these key value pairs , the next 3 lines are loaded into the hashmap as key value pairs:

JavaScript

and so on until all the lines in the file have been iterated over. How do I implement this using the iterator design pattern in java?

Advertisement

Answer

You can do something like this:

JavaScript

Basically read and parse until map reaches max size, then operate on the contents and empty it. Keep doing until you have read entire file. At the end operate on the leftover contents, if there are any.

Edit: You need to wrap reading inside iterator and also keep the max number of lines to read at once.

JavaScript

This is only one possible solution, which returns an Iterable(a List in this exact implementation), containing up to the maximum number of lines to be processed at once. This was what i came up with in order to keep the processing done by the iterator to the absolute minimum. You can (and should) have another class, which actually handles the processing of the data(parse it to a Map and so on). The thing is, even like this, the iterator has more responsibility than it should – creating the batches of data.

My proposition would be to have the iterator only return the next line, no processing at all – this is exactly what it should be doing.

JavaScript

Then create an abstraction, which will prepare data for handling:

JavaScript

Like this you can have implementations to prepare data in batches(your case), or line by line, or all at once, however you need. Exact implementation for batches may be:

JavaScript

Data parsing should be done separately(you can create abstraction for this as well), but for this example i won’t to do it.

The DataHandler interface from above:

JavaScript

And simple implementation:

JavaScript

And combining all into one:

JavaScript
  • Your main(or however your program is structured) does not care how data is prepared, DataPreparer implementations are concerned about that
  • Neither preparer, nor main, are concerned how data is handled, only DataHandler is
  • It’s easy to change behaviour, fix bugs and not break something else, extend functionality, etc.
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement