I’m wondering if I can add an operation to a stream, based off of some sort of condition set outside of the stream. For example, I want to add a limit operation to the stream if my limit
variable is not equal to -1
.
My code currently looks like this, but I have yet to see other examples of streams being used this way, where a Stream object is reassigned to the result of an intermediate operation applied on itself:
// Do some stream stuff stream = stream.filter(e -> e.getTimestamp() < max); // Limit the stream if (limit != -1) { stream = stream.limit(limit); } // Collect stream to list stream.collect(Collectors.toList());
As stated in this stackoverflow post, the filter isn’t actually applied until a terminal operation is called. Since I’m reassigning the value of stream before a terminal operation is called, is the above code still a proper way to use Java 8 streams?
Advertisement
Answer
There is no semantic difference between a chained series of invocations and a series of invocations storing the intermediate return values. Thus, the following code fragments are equivalent:
a = object.foo(); b = a.bar(); c = b.baz();
and
c = object.foo().bar().baz();
In either case, each method is invoked on the result of the previous invocation. But in the latter case, the intermediate results are not stored but lost on the next invocation. In the case of the stream API, the intermediate results must not be used after you have called the next method on it, thus chaining is the natural way of using stream as it intrinsically ensures that you don’t invoke more than one method on a returned reference.
Still, it is not wrong to store the reference to a stream as long as you obey the contract of not using a returned reference more than once. By using it they way as in your question, i.e. overwriting the variable with the result of the next invocation, you also ensure that you don’t invoke more than one method on a returned reference, thus, it’s a correct usage. Of course, this only works with intermediate results of the same type, so when you are using map
or flatMap
, getting a stream of a different reference type, you can’t overwrite the local variable. Then you have to be careful to not use the old local variable again, but, as said, as long as you are not using it after the next invocation, there is nothing wrong with the intermediate storage.
Sometimes, you have to store it, e.g.
try(Stream<String> stream = Files.lines(Paths.get("myFile.txt"))) { stream.filter(s -> !s.isEmpty()).forEach(System.out::println); }
Note that the code is equivalent to the following alternatives:
try(Stream<String> stream = Files.lines(Paths.get("myFile.txt")).filter(s->!s.isEmpty())) { stream.forEach(System.out::println); }
and
try(Stream<String> srcStream = Files.lines(Paths.get("myFile.txt"))) { Stream<String> tmp = srcStream.filter(s -> !s.isEmpty()); // must not be use variable srcStream here: tmp.forEach(System.out::println); }
They are equivalent because forEach
is always invoked on the result of filter
which is always invoked on the result of Files.lines
and it doesn’t matter on which result the final close()
operation is invoked as closing affects the entire stream pipeline.
To put it in one sentence, the way you use it, is correct.
I even prefer to do it that way, as not chaining a limit
operation when you don’t want to apply a limit is the cleanest way of expression your intent. It’s also worth noting that the suggested alternatives may work in a lot of cases, but they are not semantically equivalent:
.limit(condition? aLimit: Long.MAX_VALUE)
assumes that the maximum number of elements, you can ever encounter, is Long.MAX_VALUE
but streams can have more elements than that, they even might be infinite.
.limit(condition? aLimit: list.size())
when the stream source is list
, is breaking the lazy evaluation of a stream. In principle, a mutable stream source might legally get arbitrarily changed up to the point when the terminal action is commenced. The result will reflect all modifications made up to this point. When you add an intermediate operation incorporating list.size()
, i.e. the actual size of the list at this point, subsequent modifications applied to the collection between this point and the terminal operation may turn this value to have a different meaning than the intended “actually no limit” semantic.
Compare with “Non Interference” section of the API documentation:
For well-behaved stream sources, the source can be modified before the terminal operation commences and those modifications will be reflected in the covered elements. For example, consider the following code:
List<String> l = new ArrayList(Arrays.asList("one", "two")); Stream<String> sl = l.stream(); l.add("three"); String s = sl.collect(joining(" "));First a list is created consisting of two strings: “one”; and “two”. Then a stream is created from that list. Next the list is modified by adding a third string: “three”. Finally the elements of the stream are collected and joined together. Since the list was modified before the terminal collect operation commenced the result will be a string of “one two three”.
Of course, this is a rare corner case as normally, a programmer will formulate an entire stream pipeline without modifying the source collection in between. Still, the different semantic remains and it might turn into a very hard to find bug when you once enter such a corner case.
Further, since they are not equivalent, the stream API will never recognize these values as “actually no limit”. Even specifying Long.MAX_VALUE
implies that the stream implementation has to track the number of processed elements to ensure that the limit has been obeyed. Thus, not adding a limit
operation can have a significant performance advantage over adding a limit with a number that the programmer expects to never be exceeded.