Skip to content
Advertisement

JPA performance – SELECT DISTINCT and/or Java Set

I have a Spring Boot application with a JpaRepository. I am trying right now to obtain all the unique values for a certain field in my db.

Let’s suppose I have an object Foo with an Integer PK id and a non-unique Integer field bar. I want to get all the unique values for bar and I’m wondering which query from the following is the most efficient:

  1. Use DISTINCT in the SELECT statement and store the result in a List.
@Query("select distinct f.bar from Foo f")
List<Integer> getUniqueBar();
  1. Use DISTINCT in the SELECT and store the result in a Set.
@Query("select distinct f.bar from Foo f")
Set<Integer> getUniqueBar();
  1. Don’t use DISTINCT in the select and store the result in a Set.
@Query("select f.bar from Foo f")
Set<Integer> getUniqueBar();

All the 3 queries yield the same result. Which one is the most efficient?

IMO, it should be 1, since it does the duplicate filtering in the db, but I’m not sure if using a List over a Set brings much gain.

(I’m not interested in the actual container I’m storing the result into, as long as it is a Collection.)

Advertisement

Answer

Option 1 or 2 is best for performance, option 3 will return all bars from the table, even duplicated values. I’m guessing the reason you don’t see the duplicated values from the database is because you map the result to a Set which cannot contain duplicates. And if we talk about performance for option 1 compared to option 2 I would say it really depends on what you will be using these collections for. For example see this question: Performance and Memory allocation comparison between List and Set.

I would go with the Option 2 to make the filtering happen in the database and make it clear in the code that there should be no duplicates in this collection.

Advertisement