Skip to content
Advertisement

how to choose the right batch size while bulk inserting

How to choose the right batch size and i also have one doubt .Let’s say i have 50000 rows in my dataset i am trying to insert it in the sql using batch.The purpose of using batch is to save the time right. So why do i have to make the batch size of small set why can’t i just create a batch size of 50000 and just only one time execute it?

Advertisement

Answer

As @Erik says, max_allowed_packet is 4MB by default. Is the average length of one of your rows more than 83 bytes? That multiplied by 50,000 would exceed the default statement length.

Are your columns variable length, like VARCHAR or TEXT or BLOB? If so, then 50,000 rows might be longer than you expect, depending on the data you need to load. Perhaps today you fit 50,000 rows into one batch, but next week it will fail because the data results in an SQL statement that is too long, even with the same number of rows.

Also this week it’s 50,000 rows. What if next week the data load is 100,000 rows? Would you know in advance? Is it not safer to write the code to split the input data into smaller batches of more modest size?

Another reason is that a very bulky transaction can cause trouble for the InnoDB log buffer or the binary log buffer.

Also it can cause temporary replication lag if the statement itself takes a few seconds.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement