Friday, March 7, 2014

Found performance issue when inserting data using batches.

I have a program that creates a great number of records. The number of records created depends on the existing data, but in one case this was 1.4 million records. For performance, I used a PreparedStatement executing 5000 inserts at a time. A simplified version of my code is below.

PreparedStatement ps = con.prepareStatement("insert into personancestor (ancestor, personid, hierarchylevels, personancestorid) values (?,?,?,nextval for personancestorseq)") ;
int useCount = 0;
for (int i = 0; i < 30000; i++)
{
  ps.setString(1, "ANC" + i)  ;
  ps.setString(2, "PER" + i)  ;
  ps.setInt(3, 1) ;
  ps.addBatch() ;
  useCount++ ;
  if (useCount >= 5000)
  {
    ps.executeBatch() ;
    //ps.clearBatch() ; - this doesn't help performance

    // close and reprepare the statement to clear the data inserted earlier.
    ps.close() ;
    ps = con.prepareStatement("insert into personancestor (ancestor, personid, hierarchylevels, personancestorid) values (?,?,?,nextval for personancestorseq)") ;
    con.commit()  ;
    useCount = 0 ;
    Date nd = new Date() ;
    System.out.println("5000 inserts in " + (nd.getTime() - d.getTime()) + " ms. " + (i+1) + " total inserts.") ;
    d = nd ;
  }
}
if (useCount > 0)
{
 ps.executeBatch() ;
}
ps.close() ;
con.commit() ;
con.close() ;

I found that if I didn't re-prepare the statement, then apparently all the previous rows remain in memory. The inserts take longer and longer, until the program is just hung. Calling the PrepareStatement's clearBatch method had no effect. I was connecting to a DB2 database using the com.ibm.db2.jcc.DB2Driver class, and I don't know if this is a problem with just the DB2 connection, or this is expected behavior or not.
After fixing the data, I was able to create 1.4 million records in about 20 minutes.

No comments:

Post a Comment