Skip to content
Advertisement

Is there a simpler way to remove “duplicate” objects from an array (objects with the same property)?

If, given an array of objects, such as:

ArrayList<Person> people = new ArrayList<>(Arrays.aslist(
new Person("Victoria", 25, "Firefighter"),
new Person("Grace", 27, "Footballer"),
new Person("Samantha", 25, "Stock Broker"),
new Person("Victoria", 23, "Poker Player"),
new Person("Jane", 27, "Footballer"),
new Person("Grace", 25, "Security Guard"));

How can one remove any objects that don’t have a unique attributes, whilst leaving only one. This could be as simple as duplicate names, which would leave:

Person("Victoria", 25, "Firefighter"),
Person("Grace", 27, "Footballer"),
Person("Samantha", 25, "Stock Broker"),
Person("Jane", 27, "Footballer")

Or more complex, such as jobs that start with the same letter, and the same age:

Person("Victoria", 25, "Firefighter"),
Person("Grace", 27, "Footballer"),
Person("Samantha", 25, "Stock Broker"),
Person("Victoria", 23, "Poker Player"),

So far, the best I’ve come up with is:

    int len = people.size();
    for (int i = 0; i < len - 1; i++) {
        for (int j = i + 1; j < len; j++)
            if (function(people.get(i), people.get(j))) {
                people.remove(j);
                j--;
                len--;
            }
    }

With “function” checking if the entries are considered “duplicates”

I was wondering if there’s a library that does just this, or if you could somehow put this in a lambda expression

Advertisement

Answer

If you say “remove duplicates”, the first thing which comes into my mind, is using a Set. However, Set considers an object as “duplicate” if the set already contains an object which is “equal” to that object, by means of the equals method. Implementing Person::equals to check for a job’s first letter is not a good fit.

You want to have another sort of ‘equals method’ for this use case alone. But we have to use something else, as sacrificing equals for this use case alone should not be done.

The Stream interface contains a distinct() method to check for duplicates, but distinct doesn’t take a parameter where you can pass in a sort of Comparator or Predicate to define when a Person is considered “distinct” from another Person.

Fortunately, this excellent StackOverflow answer provides exactly what you need:

public static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> seen.add(keyExtractor.apply(t));
}

Now the next thing you must do, is create a record to collect the appropriate object properties:

record PersonAgeAndJobFilter(int age, char jobFirstLetter) {

    public static PersonAgeAndJobFilter ofPerson(Person p) {
        return new PersonAgeAndJobFilter(p.getAge(), p.getJob().charAt(0));
    }
}

Then stream over the people, using your filter:

people.stream()
    .filter(distinctByKey(PersonAgeAndJobFilter::ofPerson))
    .collect(Collectors.toSet());

Alternatively, it can also be achieved with a combination of groupingBy:

Collection<Person> persons = people.stream()
    .collect(groupingBy(PersonAgeAndJobFilter::ofPerson, collectingAndThen(reducing((a, b) -> a), Optional::get)))
    .values();
Advertisement