Skip to content
Advertisement

How Spring Batch Step chunkSize and JpaPagingItemReader pageSize work together

Im developing a Spring Batch application.

Although I’m getting more and more comfortable with it, I came across with something that is making me very confused.

Please take a look at this step configuration.

    @Bean
    @Qualifier(value = "processNonExportedMbfsOperationsStep")
    public Step processNonExportedMbfsOperationsStep() {
        return stepBuilderFactory
                .get("processNonExportedMbfsOperationsStep")
                .allowStartIfComplete(false)
                .<MbfsEntity, CsvOutputLineDto>chunk(Integer.parseInt(chunkSize))
                .reader(processNonExportedMbfsOperationsItemReader)
                .processor(processNonExportedMbfsOperationItemProcessor)
                .writer(processNonExportedMbfsOperationsCompositeItemWriter)
                .faultTolerant()
                    .retry(DataAccessException.class)
                    .retryLimit(3)
                .build();
    }

As you can see it’s a pretty standard step.

My confusion is related to the chunk size (50) and the reader (processNonExportedMbfsOperationsItemReader).

Reader code next:

    @PersistenceContext
    @Qualifier(value = "mbfsEntityManager")
    private EntityManager mbfsEntityManager;

    @Bean
    public JpaPagingItemReader<MbfsEntity> processNonExportedMbfsOperationsItemReader() {
        JpaNativeQueryProvider<MbfsEntity> queryProvider = new JpaNativeQueryProvider<>();
        queryProvider.setSqlQuery(buildQuery());
        queryProvider.setEntityClass(MbfsEntity.class);

        return new JpaPagingItemReaderBuilder<MbfsEntity>()
                .name("processNonExportedMbfsOperationsItemReader")
                .entityManagerFactory(mbfsEntityManager.getEntityManagerFactory())
                .pageSize(Integer.parseInt(chunkSize))
                .queryProvider(queryProvider)
                .build();
    }

The reader is of type JpaPagingItemReader since I have thousands of records to fetch from the DB.

So here is where the confusion starts. I would expect that this JpaPagingItemReaderBuilder would use the value of the chunk size property defined in the step config, as the value to the JpaPagingItemReader pageSize property.

But clearly that’s not the case, and I don’t know how to make sense of it.

Should I set step chunk size to 1 and the page size to the value I want, like 50?

What I’m missing? Thanky you for your time!

Advertisement

Answer

In a chunk-oriented step with no processor, the difference between the page size of the reader and the chunk size is that

  • the page size of the reader controls how many items are fetched per query from the DB,
  • the chunk size controls how many items are passed to the Writer in one invocation of its write method.

It depends on what you writer does, but 1 is most likely not a good chunk size. You can start with setting the chunk size equal to the page size and then optimize by trying and measuring the performance of different settings.

If the step contains a processor that returns null for some items, i.e. drops them, then it gets more complicated. The number of items that is passed to the writer is then only bound from above by the chunk size. The reason is that the chunks are formed before the items of the chunk are passed to the processor that may drop them.

Please also have a look at this section of the reference documentation: https://docs.spring.io/spring-batch/docs/current/reference/html/step.html#chunkOrientedProcessing

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement