Java ParallelStream: several map or single map

Question

Introduction I&#8217;m currently developing a program in which I use Java.util.Collection.parallelStream(), and wondering if it&#8217;s possible to make it more Multi-threaded. Several small map I was wondering if using multiple map might allow the Java.util.Collection.parallelStream() to distribute the tasks…

Accepted Answer

I don’t think it will do any better if you chain it with multiple maps. In case your code is not very complex I would prefer to use a single big map. To understand this we have to check the code inside the map function. linkpublic final Stream map(Function mapper) { Objects.requireNonNull(mapper); return new StatelessOp(this, StreamShape.REFERENCE, StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT) { @Override Sink opWrapSink(int flags, Sink sink) { return new Sink.ChainedReference(sink) { @Override public void accept(P_OUT u) { downstream.accept(mapper.apply(u)); } }; } };}As you can see a lot many things happen behind the scenes. Multiple objects are created and multiple methods are called. Hence, for each chained map function call all these are repeated.Now coming back to ParallelStreams, they work on the concept of Parallelism .Streams DocumentationA parallel stream is a stream that splits its elements into multiple chunks, processing each chunk with a different thread. Thus, you can automatically partition the workload of a given operation on all the cores of your multicore processor and keep all of them equally busy.Parallel streams internally use the default ForkJoinPool, which by default has as many threads as you have processors, as returned by Runtime.getRuntime().availableProcessors(). But you can change the size of this pool using the system property java.util.concurrent.ForkJoinPool.common.parallelism.ParallelStream calls spliterator() on the collection object which returns a Spliterator implementation that provides the logic of splitting a task. Every source or collection has their own spliterator implementations. Using these spliterators, parallel stream splits the task as long as possible and finally when the task becomes too small it executes it sequentially and merges partial results from all the sub tasks.So I would prefer parallelStream whenI have huge amount of data to process at a timeI have multiple cores to process the dataPerformance issues with the existing implementationI already don’t have multiple threaded process running, as it will add to the complexity.Performance ImplicationsOverhead : Sometimes when dataset is small converting a sequential stream into a parallel one results in worse performance. The overhead of managing threads, sources and results is a more expensive operation than doing the actual work.Splitting: Arrays can split cheaply and evenly, while LinkedList has none of these properties. TreeMap and HashSet split better than LinkedList but not as well as arrays.Merging:The merge operation is really cheap for some operations, such as reduction and addition, but merge operations like grouping to sets or maps can be quite expensive.Conclusion: A large amount of data and many computations done per element indicate that parallelism could be a good option.

Java ParallelStream: several map or single map

Introduction

Several small map

Single big map

Question

Advertisement

Answer