Skip to content
Advertisement

Streaming large JSON from input stream efficiently in Java

In order to save memory and avoid an OOM error, I want to stream a large JSON from an input stream and extract the desired things from it. More exactly, I want to extract and save some strings from that JSON:

  1. files.content.fileContent.subList.text = “some text in file”
  2. files.content.fileContent.subList.text = “some text in file2”

and save them into a String variable:

JavaScript

I tried to parse the JSON using Jackson:

JavaScript

The above is not working at all, that solution gets complicated. Is there any simple way to parse the JSON inputStream and extract some text from it?

Below is the JSON attached:

JavaScript

}

Advertisement

Answer

In short,

  • your code does not work because it implements a wrong algorithm;
  • JsonPath, as it has been suggested, seems to be a good DSL implementation, but it uses a DOM approach collecting entire JSON tree into memory, therefore you’ll run into OOM again.

You have two solutions:

  • implement a proper algorithm within your current approach (and I agree you were on a right way);
  • try implementing something similar to what JsonPath implements breaking down the problem to smaller ones supporting really streaming approach.

I wouldn’t document much of my code since it’s pretty easy to understand and adapt to other libraries, but you can develop a more advanced thing of the following code using Java 17 (w/ preview features enabled) and javax.json (+ some Lombok for Java boilerplate):

JavaScript

Example of use:

JavaScript

Of course, not that cool-looking as JsonPath is, but you can do the following:

  • implement a matcher builder API to make it look nicer;
  • implement a JSON Path-compliant parser to build matchers;
  • wrap the for/if/next() pattern into a generic algorithm (similar to what BufferedReader.readLine() implements or wrap it for Stream API);
  • implement some kind of simple JSON-to-objects deserializer.

Or, if possible, find a good code generator that can generate a streamed parser having as small runtime cost as possible (its outcome would be very similar to yours, but working). (Ping me please if you are aware of any.)

Advertisement