I have use case where we are inserting a very large number of rows into the DB. So I am checking my performance for inserts.
This is the entity manager implementation:
try {
TransactionTemplate transactionTemplate = new TransactionTemplate(platformTransactionManager);
long start = System.currentTimeMillis();
transactionTemplate.execute(status -> {
try {
int batchSize = 5000;
int counter = 0;
for (Data row : dataArray) {
entityManager.persist(row);
if ( (counter + 1 ) % batchSize == 0) {
entityManager.flush();
entityManager.clear();
}
counter++;
}
return "DONE";
} catch (Exception e) {
throw new RuntimeException("Error while writing", e);
}
});
The above method is taking around 200 sec for inserting a data set of around 50,000 rows while below method is taking only around 30 seconds
EntityManagerFactoryInfo info = (EntityManagerFactoryInfo) entityManager.getEntityManagerFactory();
String insertQuery = String.format("INSERT INTO my_table VALUES (?, ?)");
int batchSize =5000;
try (Connection connection = info.getDataSource().getConnection();
PreparedStatement statement = connection.prepareStatement(insertQuery);
) {
int counter = 0;
connection.setAutoCommit(false);
for (Data row : dataArray) {
statement.clearParameters();
statement.setObject(1, row.getUniqueIdentifier());
statement.setObject( 2, row.getInsertionId());
statement.addBatch();
if ((counter + 1) % batchSize == 0) {
statement.executeBatch();
connection.commit();
statement.clearBatch();
}
counter++;
}
} catch (Exception e) {
throw new RuntimeException("Error While Writing: ", e);
}
}
Why the entity manager method is slow?
1条答案
按热度按时间f87krz0w1#
RDBMS are based on set mathematic theory and treats data as sets of data not one by one (iterative approach).
Every time you run a query to INSERT a row, this is translated in :
When using a row by row approach all those task must be done for every rows except for some that can be cached (optimal execution plan)
When executing a batch only one time all these phases are runs