Skip to content
Advertisement

Stream non terminal operation + filter + findFirst

I’m trying to understand how the non-terminal streams operations are invoked.

Stream.of("aaa", "bbb", "ccc")
        .map(s -> {
            System.out.println(s);
            return s.toUpperCase();
        });
//prints nothing

Stream.of("aaa", "bbb", "ccc")
        .map(s -> {
            System.out.println(s);
            return s.toUpperCase();
        })
        .forEach(s -> {});
//prints "aaa" "bbb" "ccc"

This seems clear for me.

The first stream is not ended with terminal operation, so it doesn’t invoke the non-terminal operation and nothing is printed. The second one has terminal operation, so all the elements are printed.

Stream.of("aaa", "bbb", "ccc")
        .map(s -> {
            System.out.println(s);
            return s.toUpperCase();
        })
        .findFirst();
//prints "aaa"

Stream.of("aaa", "bbb", "ccc")
        .map(s -> {
            System.out.println(s);
            return s.toUpperCase();
        })
        .filter(s -> s.startsWith("B"))
        .findFirst();
//prints "aaa" "bbb"

This is where I’m getting confused, especially with the last one. Looks like the stream works “backwards” in a sense. First it checks what elements are returned by terminal operation and next it does intermediate operations only for these elements. But how to explain the last one? Looks like it did the mapping for all the elements up to the first one that matches the filter. And in the last example, if I replace findFirst() with forEach(), it prints all the elements, even if it has only one element in the end.

Seems a bit counter-intuitive to me. Can anyone give me the proper explanation how the stream recognizes for what elements it should perform intermediate operations?

Advertisement

Answer

The first stream is not ended with terminal operation so it doesn’t invoke the non-terminal ops and nothing is printed.

Terminal operation – is an operation that generates and returns a result of the stream execution, or performs a final action as a side-effect (in case of forEach). On the other hand, an intermediate operation is a stream operation which always returns another stream, it is meant to transform a stream pipeline somehow.

When a stream lucks a terminal operation, it would not be executed. That allows you to create a stream in one method and hand it out to another method where it will be attached with some additional operations (including a terminal) and get executed.

First it checks what elements are returned by terminal operation and next it does intermediate operations only for these elements.

No, it doesn’t work like that.

Streams are lazy, it means every action occur only when it’s needed. Stream doesn’t act like a chain of for loops.

If, for instance, we have a filter operation followed by a map operation, map gets applied only if the element passes the filter, it means an element from the stream source goes to the filter, if it passes the filter it is being handed out to the map and then goes to the terminal operation, and if stream doesn’t terminate, another element from the source repeats the same steps.

In your example Stream.of().map().filter().findFirst() stream elements will come one by one from the source until the first that passes the filter is not found. When an element passes the filter, the stream terminates and findFirst returns this element.

In most of the cases, stream elements are being processed one at a time, with exception of sorting() when we need to dump all stream data into memory to sort it.

Here’s a quote from the Stream API documentation:

Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.

Processing streams lazily allows for significant efficiencies; in a pipeline such as the filter-map-sum example above, filtering, mapping, and summing can be fused into a single pass on the data, with minimal intermediate state. Laziness also allows avoiding examining all the data when it is not necessary; for operations such as “find the first string longer than 1000 characters”, it is only necessary to examine just enough strings to find one that has the desired characteristics without examining all of the strings available from the source.

Some operations process each element independently of other elements, they retain no information about previously encountered elements and called stateless (example: filter, map, flatMap, etc.). On the other hand, some operations like distinct, takeWhile, sorted, etc. require information about previously processed elements to perform an action with the next element, i.e. they need to keep the state and therefore called stateful.

Also, have a look at this question related to stream processing.

Advertisement