If, given an array of objects, such as:
ArrayList<Person> people = new ArrayList<>(Arrays.aslist( new Person("Victoria", 25, "Firefighter"), new Person("Grace", 27, "Footballer"), new Person("Samantha", 25, "Stock Broker"), new Person("Victoria", 23, "Poker Player"), new Person("Jane", 27, "Footballer"), new Person("Grace", 25, "Security Guard"));
How can one remove any objects that don’t have a unique attributes, whilst leaving only one. This could be as simple as duplicate names, which would leave:
Person("Victoria", 25, "Firefighter"), Person("Grace", 27, "Footballer"), Person("Samantha", 25, "Stock Broker"), Person("Jane", 27, "Footballer")
Or more complex, such as jobs that start with the same letter, and the same age:
Person("Victoria", 25, "Firefighter"), Person("Grace", 27, "Footballer"), Person("Samantha", 25, "Stock Broker"), Person("Victoria", 23, "Poker Player"),
So far, the best I’ve come up with is:
int len = people.size(); for (int i = 0; i < len - 1; i++) { for (int j = i + 1; j < len; j++) if (function(people.get(i), people.get(j))) { people.remove(j); j--; len--; } }
With “function” checking if the entries are considered “duplicates”
I was wondering if there’s a library that does just this, or if you could somehow put this in a lambda expression
Advertisement
Answer
If you say “remove duplicates”, the first thing which comes into my mind, is using a Set
. However, Set
considers an object as “duplicate” if the set already contains an object which is “equal” to that object, by means of the equals
method. Implementing Person::equals
to check for a job’s first letter is not a good fit.
You want to have another sort of ‘equals method’ for this use case alone. But we have to use something else, as sacrificing equals
for this use case alone should not be done.
The Stream
interface contains a distinct()
method to check for duplicates, but distinct
doesn’t take a parameter where you can pass in a sort of Comparator
or Predicate
to define when a Person
is considered “distinct” from another Person
.
Fortunately, this excellent StackOverflow answer provides exactly what you need:
public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) { Set<Object> seen = ConcurrentHashMap.newKeySet(); return t -> seen.add(keyExtractor.apply(t)); }
Now the next thing you must do, is create a record to collect the appropriate object properties:
record PersonAgeAndJobFilter(int age, char jobFirstLetter) { public static PersonAgeAndJobFilter ofPerson(Person p) { return new PersonAgeAndJobFilter(p.getAge(), p.getJob().charAt(0)); } }
Then stream over the people
, using your filter:
people.stream() .filter(distinctByKey(PersonAgeAndJobFilter::ofPerson)) .collect(Collectors.toSet());
Alternatively, it can also be achieved with a combination of groupingBy
:
Collection<Person> persons = people.stream() .collect(groupingBy(PersonAgeAndJobFilter::ofPerson, collectingAndThen(reducing((a, b) -> a), Optional::get))) .values();