What’s the effective way to insert more a million rows into postgresql server from another postgres server using Java?

Tags: , , ,



I have two postgresql servers and I need to copy table rows with from first server format and convert to another server format (different column names).

I use java application with spring boot and jpa repository, which implemented method findAll with stream read fetch size 1000.

    @Query("select c from ExternalFormatEntity c")
    @QueryHints(@javax.persistence.QueryHint(name = "org.hibernate.fetchSize",
            value = Constants.DEFAULT_FETCH_SIZE))
    Stream<ExternalFormatEntity> findAllEntities();

After reading I convert and insert 1000 rows in batch.

try (Stream<ExternalFormatEntity> allExtEntitiesStream = extFormatService.getAllEntities()) {
    LinkedList<CanonicalFormatEntity> canonicalEntityList = new LinkedList<>();
        allExtEntitiesStream.forEach(extEntity -> {
            if (Objects.nonNull(extEntity)) {
                canonicalEntityList.add(SomeConverter.convert(extEntity));
            }
            if (canonicalEntityList.size() >= DEFAULT_BATCH_SIZE) {
                List<CanonicalFormatEntity> copyList = new LinkedList<>(canonicalEntityList);
                canonicalEntityList.clear();
                Thread thread = new Thread(() -> {
                    canonicalEntityRepository.saveAll(copyList);
                    canonicalEntityRepository.flush();
                    copyList.clear();
                });
                thread.start();
            }
        });
}

For my opinion, current speed of this operation can be faster than 1 hour for 1 million records. Can I speed up this operation, if yes, how to do it ?

Foremost, I tried to convert table records from first database to CSV file, save it on another server and use Postgres Copy Api for downloading but the summary time is still unacceptable due to additional operations with the hard disk.

Maybe postgres have stream writing or something else? I cant find answer in official postgresql docs.

Answer

For my case helped next solution:

  1. export external table to csv file with zip compression (example from StackOverflow answer: https://stackoverflow.com/a/3981807/3744622)

  2. copy small zip file to postgres server in /tmp folder scp root@ext_server:/path/to/file root@target_server:/tmp/

  3. import table from csv zipped file (example from StackOverflow answer: https://stackoverflow.com/a/46228247/3744622)

I achieved summary time about 10 minutes.

Thank you all, this is wonderful place)



Source: stackoverflow