I have a Spring Boot application with a JpaRepository. I am trying right now to obtain all the unique values for a certain field in my db.
Let’s suppose I have an object Foo
with an Integer PK id
and a non-unique Integer field bar
. I want to get all the unique values for bar
and I’m wondering which query from the following is the most efficient:
- Use
DISTINCT
in theSELECT
statement and store the result in a List.
@Query("select distinct f.bar from Foo f") List<Integer> getUniqueBar();
- Use
DISTINCT
in theSELECT
and store the result in a Set.
@Query("select distinct f.bar from Foo f") Set<Integer> getUniqueBar();
- Don’t use
DISTINCT
in the select and store the result in a Set.
@Query("select f.bar from Foo f") Set<Integer> getUniqueBar();
All the 3 queries yield the same result. Which one is the most efficient?
IMO, it should be 1, since it does the duplicate filtering in the db, but I’m not sure if using a List over a Set brings much gain.
(I’m not interested in the actual container I’m storing the result into, as long as it is a Collection
.)
Advertisement
Answer
Option 1 or 2 is best for performance, option 3 will return all bars from the table, even duplicated values. I’m guessing the reason you don’t see the duplicated values from the database is because you map the result to a Set which cannot contain duplicates. And if we talk about performance for option 1 compared to option 2 I would say it really depends on what you will be using these collections for. For example see this question: Performance and Memory allocation comparison between List and Set.
I would go with the Option 2 to make the filtering happen in the database and make it clear in the code that there should be no duplicates in this collection.