SQL Server Entity manager is Slow than Direct Insert

yhived7q  于 2023-08-02  发布在  其他
关注(0)|答案(1)|浏览(102)

I have use case where we are inserting a very large number of rows into the DB. So I am checking my performance for inserts.

This is the entity manager implementation:

try {
            TransactionTemplate transactionTemplate = new TransactionTemplate(platformTransactionManager);
            long start = System.currentTimeMillis();
            transactionTemplate.execute(status -> {
                try {
                    int batchSize = 5000;
                    int counter = 0;
                   
                    for (Data row : dataArray) {
                        entityManager.persist(row);

                        if (  (counter + 1 ) % batchSize == 0) {
                            entityManager.flush();
                            entityManager.clear();
                        }
                        counter++;
                    }
                    
                    return "DONE";
                } catch (Exception e) {
              
                    throw new RuntimeException("Error while writing", e);

                }
            });

The above method is taking around 200 sec for inserting a data set of around 50,000 rows while below method is taking only around 30 seconds

EntityManagerFactoryInfo info = (EntityManagerFactoryInfo) entityManager.getEntityManagerFactory();
            String insertQuery = String.format("INSERT INTO my_table VALUES (?, ?)");
            int batchSize =5000;
    
            try (Connection connection = info.getDataSource().getConnection();
                 PreparedStatement statement = connection.prepareStatement(insertQuery);
            ) {
                
                int counter = 0;
                connection.setAutoCommit(false);
    
                for (Data row : dataArray) {
                    statement.clearParameters();
    
                     statement.setObject(1, row.getUniqueIdentifier());
                     statement.setObject( 2, row.getInsertionId());
                    
                 
    
                    statement.addBatch();
                    if ((counter + 1) % batchSize == 0) {
                        statement.executeBatch();
                        connection.commit();
                        statement.clearBatch();
                    }
                    counter++;
                }
                
            } catch (Exception e) {
                throw new RuntimeException("Error While Writing: ", e);

}
    }

Why the entity manager method is slow?

f87krz0w

f87krz0w1#

RDBMS are based on set mathematic theory and treats data as sets of data not one by one (iterative approach).

Every time you run a query to INSERT a row, this is translated in :

  • syntaxic analysis of the query
  • resolving full name of the object if query does not have name prefixed by SQL schema (dbo...)
  • verifying in system tables is the table(s) and the column(s) exists
  • verifying if the user have the priviliege to execute the command on the table
  • transforming the query text into mathematic formulae (algebra)
  • simplifying the formlulae
  • compute an optimal query execution plan
  • and finally executing the query...

When using a row by row approach all those task must be done for every rows except for some that can be cached (optimal execution plan)

When executing a batch only one time all these phases are runs

相关问题