Group by multiple fields and filter by common value of a field

Tags: ,



@Data
public class Employee{

    private int empid;
    private  String empPFcode;
    private String collegeName;
}

Employee emp1=new Employee (1334090,"220","AB");
Employee emp2=new Employee (1334091,"220","AB");
Employee emp3=new Employee (1334092,"220","AC");
Employee emp4=new Employee (1434091,"221","DP");
Employee emp5=new Employee (1434091,"221","DP");
Employee emp6=new Employee (1434092,"221","DP");

I want to filter this Employee object based on the EmpPFcode . If collegeName has common value for 3 EmpPFcode, we will collect otherwise we will skip that records.

So my result would be like below.

Employee emp4=new Employee (1434091,"221","DP");
Employee emp5=new Employee (1434091,"221","DP");
Employee emp6=new Employee (1434092,"221","DP");

Below one will skip because collageName is different.

I try to do some logic below but it doesn’t not filter properly.

List<CombinedDTO> distinctElements = list.stream().filter(distinctByKeys(Employee ::empPFcode,Employee ::collegeName))
                .collect(Collectors.toList());


public static <T> Predicate <T> distinctByKeys(Function<? super T, Object>... keyExtractors) {
     Map<Object, Boolean> uniqueMap = new ConcurrentHashMap<>();

     return t ->
     {
         final List<?> keys = Arrays.stream(keyExtractors)
                 .map(ke -> ke.apply(t))
                 .collect(Collectors.toList());

         return uniqueMap.putIfAbsent(keys, Boolean.TRUE) == null;
     };
}

Answer

I. Solution:

A more cleaner and readable solution would be to have a set of empPFcode values ([221]), then filter the employee list only by this set.

First you can use Collectors.groupingBy() to group by empPFcode, then you can use Collectors.mapping(Employee::getCollegeName, Collectors.toSet()) to get a set of collegeName values.

Map<String, Set<String>> pairMap = list.stream().collect(Collectors.groupingBy(Employee::getEmpPFcode,
        Collectors.mapping(Employee::getCollegeName, Collectors.toSet()))); 

will result in: {220=[AB, AC], 221=[DP]}

Then you can remove the entries which includes more than one collegeName:

pairMap.values().removeIf(v -> v.size() > 1); 

will result in: {221=[DP]}

The last step is filtering the employee list by the key set. You can use java.util.Set.contains() method inside the filter:

List<Employee> distinctElements = list.stream().filter(emp -> pairMap.keySet().contains(emp.getEmpPFcode()))
        .collect(Collectors.toList());

II. Solution:

If you use Collectors.groupingBy() nested you’ll get a Map<String,Map<String,List<Employee>>>:

{
   220 = {AB=[...], AC=[...]}, 
   221 = {DP=[...]}
}

Then you can filter by the map size (Map<String,List<Employee>>) to eliminate the entries which has more than one map in their values (AB=[...], AC=[...]).

You still have a Map<String,Map<String,List<Employee>>> and you only need List<Employee>. To extract the employee list from the nested map, you can use flatMap().

Try this:

List<Employee> distinctElements = list.stream()
                .collect(Collectors.groupingBy(Employee::getEmpPFcode, Collectors.groupingBy(Employee::getCollegeName)))
                .entrySet().stream().filter(e -> e.getValue().size() == 1).flatMap(m -> m.getValue().values().stream())
                .flatMap(List::stream).collect(Collectors.toList());


Source: stackoverflow