Project has Spring Boot with JPA. We have a table vehicle with 1m+ records. Table has an indexed field type.
We have a use case, where we want to fetch all the records by type. For each type, we get all the vehicle records, then for next type, then for next and so on.
Since there are 1m+ records, we are fetching the records for each type with the batch size of 1000. We also apply a filter with type column.
VehicleRepository.java
Page<VehicleRecord> findByType(String type, Pageable pageable);
VehicleService.java
for (String type: vehicleTypes) { Pageable pageable = PageRequest.of(0, 1000, Sort.by("updated_at").ascending()); Page<VehicleRecord> vehicles = null; do { vehicles = vehicleRepository.findByType(type, pageable); // do something with vehicles pageable = pageable.next(); } while (vehicles.hasNext()); }
For the sake of understanding, assume there are 5 types of records:
- A – 0 vehicles
- B – 100000 vehicles
- C – 0 vehicles
- D – 0 vehicles
- E – 0 vehicles
Problem:
In this, when fetching data for A, findByType completes in < 100ms. which is good.
When fetching for B though, first fetch with LIMIT 1000 OFFSET 0 takes ~200ms. But it is downhill from here, with increase in OFFSET value, time also increases. By the time LIMIT is 1000 and OFFSET 90000, findByType takes 6000-7000ms.
What is further confusing is that after data is fetched for B, rest of the types (C, D & E) each take 3000-4000ms, when they have 0 data.
I am not sure what is happening here. I read somewhere that it is because of high OFFSET value that the method is taking so much time. But that does not explain why the method would take so much time for C, D, and E.
Any inputs would be helpful. Thanks
Edit 1: Profiling results (Visual VM)
- SQL queries are executing normally, they take hardly 150-200ms, even for high offset values.
- This is unexpected, that the vehicles collection keeps adding vehicle records to it after each iteration (observed this in memory section of profiler). I expected “live object” count to remain to max 1000, because that’s what our limit size is. But after each iteration, it keeps adding 1000 records to it. Even after performing manual GC from profiler, it won’t free that memory, until all the iterations for for loop are complete.
Advertisement
Answer
From the comments, the issue does not seem to be with the paging query itself, but with how it is used and the amount of data affects the JVM. The code snippet provided suggests you are calling vehicleRepository.findByType(type, pageable);
multiple times within the same VehicleService method, with the implication being they would be all within the same EntityManager/transactional context. JPA requires EntityManager contexts cache every entity read through them, so that they can monitor and serialize any changes made to the database. If you are reading in large batches of entities, that builds up – EntityManagers are meant to represent units of work and not be long lived like that.
Solution(s) would be to break up each ‘batch’ into its own transactional context, with a call for each vehicle type.
Alternatively, you can get a handle on the EntityManager instance. After processing your entities, call EntityManager.clear() to have it release its references to all managed entities within it, and allow them to be garbage collected if you don’t have application references to them.