Skip to content
Advertisement

Read Large file in Chunks and Compare each line in Java

I have a text file with entries like below.

{"id":"event1","state":"start","timestamp":"11025373"}
{"id":"event1","state":"end","timestamp":"11025373"}
{"id":"event2","state":"start","timestamp":"11025387"}
{"id":"event3","state":"start","timestamp":"11025388"}
{"id":"event3","state":"end","timestamp":"11025391"}
{"id":"event2","state":"end","timestamp":"11025397"}

I want to read the file as input and compare the time consumed by each event using Java. Like event1 has taken (11025373 – 11025373) = 4ms time. (start – end) event2 has taken (11025397 – 11025387) = 10ms time.

I initially thought to read line by line.

File file = new File("C:\Users\xyz\inputfile.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null)
LOGGER.info(line);

Considering the input file size can be very Large is this the right approach?. Any suggestion for best apporach will be helpful. And also how to compare each object in the file, i.e. compare “start” of event1 to “end” of event1 if I go line by line.

Advertisement

Answer

Considering the input file size can be very Large this is not not suitable I feel.

This is bizarre. It is, in fact, precisely the right approach. The wrong approach would be to read the entire thing in.

The only exception is if a single line can itself be truly humongous (let’s say 128MB or up – that’s.. a heck of a long line).

That is JSON format, you need a JSON reader. I suggest Jackson.

Make a class with the structure of that line, presumably something like:

enum State {
  start, end;
}

class Event {
  String id;
  State state;
  long timestamp;
}

Then, read a single line, ask Jackson to turn that line into an instance of Event, process it, and repeat until you’re done with the file. This will let you process a file that is many GBs in size if you want, as long as any given line is not ridiculously long.

If a single line is ridiculously long: Well, JSON is not really designed for ‘streaming’, and most JSON libraries therefore don’t do it, or at least don’t make it easy. I therefore strongly suggest you don’t attempt to write something that can ‘stream’ a single line unless you’re sure you really need to do this.

The only slightly complicated thing here is that you need to remember the last read entry, so that you can update its ‘time taken’ property at that point, as you can only know that once you read the line after the right entry. This is basic programming though.

Advertisement