I have read a lot about Java 8 streams lately, and several articles about lazy loading with Java 8 streams specifically: here and over here. I can’t seem to shake the feeling that lazy loading is COMPLETELY useless (or at best, a minor syntactic convenience offering zero performance value).
Let’s take this code as an example:
int[] myInts = new int[]{1,2,3,5,8,13,21}; IntStream myIntStream = IntStream.of(myInts); int[] myChangedArray = myIntStream .peek(n -> System.out.println("About to square: " + n)) .map(n -> (int)Math.pow(n, 2)) .peek(n -> System.out.println("Done squaring, result: " + n)) .toArray();
This will log in the console, because the terminal operation
, in this case toArray()
, is called, and our stream is lazy and executes only when the terminal operation is called. Of course I can also do this:
IntStream myChangedInts = myIntStream .peek(n -> System.out.println("About to square: " + n)) .map(n -> (int)Math.pow(n, 2)) .peek(n -> System.out.println("Done squaring, result: " + n));
And nothing will be printed, because the map isn’t happening, because I don’t need the data. Until I call this:
int[] myChangedArray = myChangedInts.toArray();
And voila, I get my mapped data, and my console logs. Except I see zero benefit to it whatsoever. I realize I can define the filter code long before I call to toArray()
, and I can pass around this “not-really-filtered stream around), but so what? Is this the only benefit?
The articles seem to imply there is a performance gain associated with laziness, for example:
In the Java 8 Streams API, the intermediate operations are lazy and their internal processing model is optimized to make it being capable of processing the large amount of data with high performance.
and
Java 8 Streams API optimizes stream processing with the help of short circuiting operations. Short Circuit methods ends the stream processing as soon as their conditions are satisfied. In normal words short circuit operations, once the condition is satisfied just breaks all of the intermediate operations, lying before in the pipeline. Some of the intermediate as well as terminal operations have this behavior.
It sounds literally like breaking out of a loop, and not associated with laziness at all.
Finally, there is this perplexing line in the second article:
Lazy operations achieve efficiency. It is a way not to work on stale data. Lazy operations might be useful in the situations where input data is consumed gradually rather than having whole complete set of elements beforehand. For example consider the situations where an infinite stream has been created using Stream#generate(Supplier<T>) and the provided Supplier function is gradually receiving data from a remote server. In those kind of the situations server call will only be made at a terminal operation when it’s needed.
Not working on stale data? What? How does lazy loading keep someone from working on stale data?
TLDR: Is there any benefit to lazy loading besides being able to run the filter/map/reduce/whatever operation at a later time (which offers zero performance benefit)?
If so, what’s a real-world use case?
Advertisement
Answer
Your terminal operation, toArray()
, perhaps supports your argument given that it requires all elements of the stream.
Some terminal operations don’t. And for these, it would be a waste if streams weren’t lazily executed. Two examples:
//example 1: print first element of 1000 after transformations IntStream.range(0, 1000) .peek(System.out::println) .mapToObj(String::valueOf) .peek(System.out::println) .findFirst() .ifPresent(System.out::println); //example 2: check if any value has an even key boolean valid = records. .map(this::heavyConversion) .filter(this::checkWithWebService) .mapToInt(Record::getKey) .anyMatch(i -> i % 2 == 0)
The first stream will print:
0 0 0
That is, intermediate operations will be run just on one element. This is an important optimization. If it weren’t lazy, then all the peek()
calls would have to run on all elements (absolutely unnecessary as you’re interested in just one element). Intermediate operations can be expensive (such as in the second example)
Short-circuiting terminal operation (of which toArray
isn’t) make this optimization possible.