I am working with a CSV file which includes some information about accidents.
I’ve created the Accident
type:
private Integer driverAge; private Integer vehicleAge; public Accident(Integer driverAge, Integer vehicleAge) { this.driverAge = driverAge; this.vehicleAge = vehicleAge; }
I’ve also created a function that reads all the CSV file, converts all the accidents to a List<Accident>
and saves it to this type AccidentArchive
:
private List<Accident> accidents; public AccidentArchive(List<Accident> accidents) { this.accidents = accidents; }
So, we are working with streams which I don’t understand entirely yet, and I’ve been stuck in this exercise where I have to make a function that returns a SortedMap<K, V>
in which the key has to be the driverAge
values and the value has to be a list sorted in descending order of the n
lowest vehicleAge
values with the same driverAge
value:
public SortedMap<Integer, List<Integer>> getNMinVehicleAgesPerDriverAge(Integer n) { return getAccidents().stream(). ...
I have tried using Collectors.toMap()
and Collectors.toList()
to somehow make it work, but I have no idea how to do it.
Advertisement
Answer
Simplified approach
This problem correlates with the algorithmic question of finding N
maximum (or minimum) values via partial sorting. It’s implementation using collectors might seem tough, therefore I’ve decided to introduce a simplified solution.
We can use a flavor of groupingBy()
that expects three arguments:
- a
classifier
function, - a supplier
mapFactory
(which allows to specify the resulting type of map) - and a downstream collector.
As a downstream collector of groupingBy()
we can utilize collectingAndThen
with a combination of collectors mapping()
and toList()
and a function that will sort the whole resulting list mapped to each key, and then will drop unnecessary values retaining only n
lowest vehicleAge
:
public SortedMap<Integer, List<Integer>> getNMinVehicleAgesPerDriverAge(Integer n) { return getAccidents().stream() .collect(Collectors.groupingBy(Accident::getDriverAge, TreeMap::new, Collectors.collectingAndThen( Collectors.mapping(Accident::getVehicleAge, Collectors.toList()), list -> list.stream() .sorted(Comparator.reverseOrder()) .limit(n) .collect(Collectors.toList())))); }
More performant version
As I’ve said before, we don’t need to sort all values mapped to each key when we need only some of them. When n
is dwarfish in comparison to the total size of the list (like 3
to 100,000
for every key) it will cause a serious performance hit.
We can introduce a partial sorting by using PriorityQueue
(which is an implementation of the Heap data structure built-in in the JDK).
To enhance the previous solution, we need to replace the downstream collector of groupingBy()
you can utilize a combination of mapping()
and a custom collector backed by the PriorityQueue
which retain only n
lowest vehicleAge
values associated with each driverAge
:
public SortedMap<Integer, List<Integer>> getNMinVehicleAgesPerDriverAge(Integer n) { return getAccidents().stream() .collect(Collectors.groupingBy(Accident::getDriverAge, TreeMap::new, Collectors.mapping(Accident::getVehicleAge, getMaxN(n, Comparator.<Integer>reverseOrder())))); }
The method provided below is responsible for generating a custom collector based on the provided maximum size of the resulting list and a comparator. The logic behind it was explained in great detail in this answer:
public static <T> Collector<T, ?, List<T>> getMaxN(int size, Comparator<T> comparator) { return Collector.of( () -> new PriorityQueue<>(comparator), (Queue<T> queue, T next) -> tryAdd(queue, next, comparator, size), (Queue<T> left, Queue<T> right) -> { right.forEach(next -> tryAdd(left, next, comparator, size)); return left; }, (Queue<T> queue) -> queue.stream().toList(), Collector.Characteristics.UNORDERED); } public static <T> void tryAdd(Queue<T> queue, T next, Comparator<T> comparator, int size) { if (queue.size() == size && comparator.compare(next, queue.element()) < 0) queue.remove(); // if next value is less than the smallest element in the queue and max size has been exceeded the largest element needs to be removed from the queue if (queue.size() < size) queue.add(next); }
By the way, if your assignment doesn’t specify the requirement to utilize SortedMap
as a return type. It better to use NavigableMap
interface, which defines a wider range of methods, instead.