Skip to content

Will using a parallel stream on a single-core processor be slower than using a sequential stream?

I am applying an operation to every element in a very large LinkedList<LinkedList<Double>>:

list.stream().map(l -> l.stream().filter(d -> 
(Collections.max(l) - d) < 5)
.collect(Collectors.toCollection(LinkedList::new))).collect(Collectors.toCollection(LinkedList::new));

On my computer (quad-core), parallel streams seem to be faster than using sequential streams:

list.parallelStream().map(l -> l.parallelStream().filter(d -> 
(Collections.max(l) - d) < 5)
.collect(Collectors.toCollection(LinkedList::new))).collect(Collectors.toCollection(LinkedList::new));

However, not every computer is going to be multi-core. My question is, will using parallel streams on a single-processor computer be noticeably slower than using sequential streams?

Answer

This is highly implementation specific, but usually, a parallel stream will go through a different code path for most operations, which implies performing additional work, but at the same time, the thread pool will be configured to the number of CPU cores.

E.g., if you run the following program

System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "1");
System.out.println("Parallelism: "+ForkJoinPool.getCommonPoolParallelism());
Set<Thread> threads = ConcurrentHashMap.newKeySet();
for(int run=0; run<2; run++) {
    IntStream stream = IntStream.range(0, 100);
    if(run==1) {
        stream = stream.parallel();
        System.out.println("Parallel:");
    }
    int chunks = stream
        .mapToObj(i->Thread.currentThread())
        .collect(()->new int[]{1}, (a,t)->threads.add(t), (a,b)->a[0]+=b[0])[0];
    System.out.println("processed "+chunks+" chunk(s) with "+threads.size()+" thread(s)");
}

it will print something like

Parallelism: 1
processed 1 chunk(s) with 1 thread(s)
Parallel:
processed 4 chunk(s) with 1 thread(s)

You can see the effect of splitting the workload, whereas splitting to four times the configured parallelism is not a coincidence, but also that only one thread is involved, so there is no inter-thread communication happening here. Whether the JVM’s optimizer will detect the single-thread nature of this operation and elide synchronization costs in this case, is, like anything else, an implementation detail.

All in all, the overhead is not very big and doesn’t scale with the actual amount of work, so if the actual work is big enough to benefit from parallel processing on SMP machines, the fraction of the overhead will be negligible on single core machines.


But if you care for performance, you should also look at the other aspects of your code.

By repeating an operation like Collections.max(l) for every element of l, you are combining two linear operations to an operation with quadratic time complexity. It’s easy to perform this operation only once instead:

List<List<Double>> result =
    list.parallelStream()
        .map(l -> {
                double limit = Collections.max(l)-5;
                return l.parallelStream()
                        .filter(d -> limit < d)
                        .collect(Collectors.toCollection(LinkedList::new));
            })
        .collect(Collectors.toCollection(LinkedList::new));

Depending on the list sizes, the impact of this little change, turning a quadratic operation to linear, might be far bigger than dividing the processing time by just the number of cpu cores (in the best case).

The other consideration is whether you really need a LinkedList. For most practical purposes, a LinkedList performs worse than, e.g. an ArrayList, and if you don’t need mutability, you may just use the toList() collector and let the JRE return the best list it can offer…

List<List<Double>> result =
    list.parallelStream()
        .map(l -> {
                double limit = Collections.max(l)-5;
                return l.parallelStream()
                        .filter(d -> limit < d)
                        .collect(Collectors.toList());
            })
        .collect(Collectors.toList());

Keep in mind that after changing the performance characteristics, rechecking whether the parallelization still has any benefit is recommended. It should also be checked for both stream operations individually. Usually, if the outer stream has a decent parallelization, turning the inner stream to parallel does not improve the overall performance.

Also, the benefit of parallel streams will be much higher if the source lists are random access lists instead of LinkedLists.