I am trying to insert 100,000 rows in a MYSQL table under 5 seconds using Hibernate(JPA). I have tried every trick hibernate offers and still can not do better than 35 seconds.
1st optimisation : I started with IDENTITY sequence generator which was resulting in 60 seconds to insert. I later abandoned the sequence generator and started assigning the @Id
field myself by reading the MAX(id)
and using AtomicInteger.incrementAndGet()
to assign fields myself. That reduced the insert time to 35 seconds.
2nd optimisation : I enabled batch inserts, by adding
<prop key="hibernate.jdbc.batch_size">30</prop>
<prop key="hibernate.order_inserts">true</prop>
<prop key="hibernate.current_session_context_class">thread</prop>
<prop key="hibernate.jdbc.batch_versioned_data">true</prop>
to the configuration. I was shocked to find that batch inserts did absolutely nothing to decrease insert time. It was still 35 seconds!
Now, I am thinking about trying to insert using multiple threads. Anyone has any pointers? Should I have chosen MongoDB?
Below is my configuration: 1. Hibernate configuration `
<bean id="entityManagerFactoryBean" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean"> <property name="dataSource" ref="dataSource" /> <property name="packagesToScan" value="com.progresssoft.manishkr" /> <property name="jpaVendorAdapter"> <bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter" /> </property> <property name="jpaProperties"> <props> <prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto}</prop> <prop key="hibernate.dialect">${hibernate.dialect}</prop> <prop key="hibernate.show_sql">${hibernate.show_sql}</prop> <prop key="hibernate.format_sql">${hibernate.format_sql}</prop> <prop key="hibernate.jdbc.batch_size">30</prop> <prop key="hibernate.order_inserts">true</prop> <prop key="hibernate.current_session_context_class">thread</prop> <prop key="hibernate.jdbc.batch_versioned_data">true</prop> </props> </property> </bean> <bean class="org.springframework.jdbc.datasource.DriverManagerDataSource" id="dataSource"> <property name="driverClassName" value="${database.driver}"></property> <property name="url" value="${database.url}"></property> <property name="username" value="${database.username}"></property> <property name="password" value="${database.password}"></property> </bean> <bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager"> <property name="entityManagerFactory" ref="entityManagerFactoryBean" /> </bean> <tx:annotation-driven transaction-manager="transactionManager" />
`
- Entity configuration :
`
@Entity @Table(name = "myEntity") public class MyEntity { @Id private Integer id; @Column(name = "deal_id") private String dealId; .... .... @Temporal(TemporalType.TIMESTAMP) @Column(name = "timestamp") private Date timestamp; @Column(name = "amount") private BigDecimal amount; @OneToOne(cascade = CascadeType.ALL) @JoinColumn(name = "source_file") private MyFile sourceFile; public Deal(Integer id,String dealId, ....., Timestamp timestamp, BigDecimal amount, SourceFile sourceFile) { this.id = id; this.dealId = dealId; ... ... ... this.amount = amount; this.sourceFile = sourceFile; } public String getDealId() { return dealId; } public void setDealId(String dealId) { this.dealId = dealId; } ... ... .... public BigDecimal getAmount() { return amount; } public void setAmount(BigDecimal amount) { this.amount = amount; } .... public Integer getId() { return id; } public void setId(Integer id) { this.id = id; }
`
- Persisting code (service) :
`
@Service @Transactional public class ServiceImpl implements MyService{ @Autowired private MyDao dao; .... `void foo(){ for(MyObject d : listOfObjects_100000){ dao.persist(d); } }
` 4. Dao class :
`
@Repository public class DaoImpl implements MyDao{ @PersistenceContext private EntityManager em; public void persist(Deal deal){ em.persist(deal); } }
`
Logs: `
DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:32.906 [http-nio-8080-exec-2]
… …
DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement 18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?) 18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 27 18:26:34.011 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - update deal_source_file set invalid_rows=?, source_file=?, valid_rows=? where id=? 18:26:34.015 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 1 18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - committed JDBC Connection 18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - re-enabling autocommit 18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.orm.jpa.JpaTransactionManager - Closing JPA EntityManager [org.hibernate.jpa.internal.EntityManagerImpl@2354fb09] after transaction 18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.o.jpa.EntityManagerFactoryUtils - Closing JPA EntityManager 18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.internal.JdbcCoordinatorImpl - HHH000420: Closing un-released batch 18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Releasing JDBC connection 18:26:34.033 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Released JDBC connection
‘
Advertisement
Answer
After trying all possible solutions I finally found a solution to insert 100,000 rows under 5 seconds!
Things I tried:
1) Replaced hibernate/database’s AUTOINCREMENT/GENERATED id’s by self generated ID’s using AtomicInteger
2) Enabling batch_inserts with batch_size=50
3) Flushing cache after every ‘batch_size’ number of persist() calls
4) multithreading (did not attempt this one)
Finally what worked was using a native multi-insert query and inserting 1000 rows in one sql insert query instead of using persist() on every entity. For inserting 100,000 entities, I create a native query like this "INSERT into MyTable VALUES (x,x,x),(x,x,x).......(x,x,x)"
[1000 row inserts in one sql insert query]
Now it takes around 3 seconds for inserting 100,000 records! So the bottleneck was the orm itself! For bulk inserts, the only thing that seems to work is native insert queries!