During my quest to find the fastest method to get data from Java to SQL Server, I have noticed that the fastest Java-method I can come up with, is still 12 times slower than using BULK INSERT.
My data is being generated from within Java, and BULK INSERT only supports reading data from a text file, so using BULK INSERT is not an option unless I output my data to a temporary text file. This in turn, would of course be a huge performance hit.
When inserting from Java, insert speeds are around 2500 rows per second. Even when I measure the time after the for loop, and just before the executeBatch. So “creating” the data in-memory is not the bottleneck.
When inserting with BATCH INSERT, insert speeds are around 30000 rows per second.
Both tests have been done on the server. So network is also not a bottleneck. Any clue as to why BATCH INSERT is faster? And, if the same performance can be attained from within Java?
This is just a big dataset that needs to get loaded once. So it would be OK to temporary disable any kind of logging (already tried simple logging), disable indexes (table has none), locking, whatever, …
My test-setup so far
Database:
CREATE TABLE TestTable ( Col1 varchar(50) , Col2 int);
Java:
// This seems to be essential to get good speeds, otherwise batching is not used. conn.setAutoCommit(false); PreparedStatement prepStmt = conn.prepareStatement("INSERT INTO TestTable (Col1, Col2) VALUES (?, ?)"); for (int i = 1; i <= 10000; i++) { prepStmt.setString(1,"X"); prepStmt.setInt(2,100); prepStmt.addBatch(); } prepStmt.executeBatch(); conn.commit();
BULK INSERT:
// A text file containing "X 100" over and over again... so the same data as generated in JAVA bulk insert TestTable FROM 'c:testtest.txt';
Advertisement
Answer
While BULK INSERT
is the fastest way of doing bulk insert, SQL Server supports remote (client-driven) bulk insert operations both through the native driver and ODBC. From version 4.2 onwards of the JDBC driver, this functionality is exposed through the SQLServerBulkCopy
class, which does not directly read from files but does support reading from a RowSet
, ResultSet
or a custom implementation of ISQLServerBulkRecord
for generated data. This functionality is equivalent to the .NET SqlBulkCopy
class, with largely the same interface, and should be the fastest way of performing bulk operations short of a server-based BULK INSERT
.
EDIT: Example by OP
Below you can find an example use-case that could be used to test the performance of SQLServerBulkCSVFileRecord, a method similar to SQLServerBulkCopy except that it reads from a text file. In my test case, test.txt contained a million rows with “X tab
100″
CREATE TABLE TestTable (Col1 varchar(50), Col2 int);
The table should not have any indexes enabled.
In JAVA
// Make sure to use version 4.2, as SQLServerBulkCSVFileRecord is not included in version 4.1 import com.microsoft.sqlserver.jdbc.*; long startTime = System.currentTimeMillis(); SQLServerBulkCSVFileRecord fileRecord = null; fileRecord = new SQLServerBulkCSVFileRecord("C:\temp\test.txt", true); fileRecord.addColumnMetadata(1, null, java.sql.Types.NVARCHAR, 50, 0); fileRecord.addColumnMetadata(2, null, java.sql.Types.INTEGER, 0, 0); Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver"); Connection destinationConnection = DriverManager.getConnection("jdbc:sqlserver://Server\\Instance:1433", "user", "pass"); SQLServerBulkCopyOptions copyOptions = new SQLServerBulkCopyOptions(); // Depending on the size of the data being uploaded, and the amount of RAM, an optimum can be found here. Play around with this to improve performance. copyOptions.setBatchSize(300000); // This is crucial to get good performance copyOptions.setTableLock(true); SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(destinationConnection); bulkCopy.setBulkCopyOptions(copyOptions); bulkCopy.setDestinationTableName("TestTable"); bulkCopy.writeToServer(fileRecord); long endTime = System.currentTimeMillis(); long totalTime = endTime - startTime; System.out.println(totalTime + "ms");
Using this example, I was able to get insert speeds of up to 30000 rows per second.