persistence@glassfish.java.net

Getting out of memory exception during bulk insert

From: Peter Melnikov <peter.melnikov_at_gmail.com>
Date: Tue, 27 Mar 2007 01:57:42 +0300

Hi All,

I hope this is a right place to ask. I am new to JPA technology and ORM in
general. I wrote a simple application to play around JPA and got some issues
during bulk insert of [not so] large dataset. I understand that there are
more appropriate ways in production system to import large dataset into a
database. But I want to know the limitations of JPA and other ORM before I
use it in my future applications.

I use glassfish-persistence-v2-b38 downloaded from oracle site recently,
Spring 2.0.3, MySQL 4.0.13 and Java 6.

I have created a business service class with the single method:

@Transactional
public void createData() {
        for(int i = 0; i < 5000; i++) {
            Address address = new AddressImpl("USA", "New York", "Address
Line 1", "10116");
            Company company = new CompanyImpl("COMPANY" + i, 343244, "
contact_at_company.com", address);
            dao.create(company);
            if((i+1)%1000 == 0) {
                dao.getEntityManager().flush();
                dao.getEntityManager().clear();
            }
        }
}

My generic dao class implements create as following:

public T create(T entity) {
    em.persist(entity);
    return entity;
}

EntityManager is injected via spring PersistenceAnnotationBeanPostProcessor.
Nothing too sophisticated indeed.

When I call the createData method for 20-40k iterations in the loop I get
OutOfMemoryException some were in the middle of the transaction. I though
that clear method of EntityManager clears context fully from managed
entities but it seems not a case in my situation.

I have found out that member variable: UnitOfWorkChangeSet
cumulativeUOWChangeSet; declared in class RepetableWriteUnitOfWork not
cleared during the call of clear method in the middle of a transaction (only
member variables in UnitOfWorkImpl superclass are cleared). So, it keeps
object references in hashtables though entire transaction thus blowing up
the heap space as more new objects are getting persisted. Is it by design of
JPA or just TopLink Essentials implementation specifics? Is there any way to
do bulk insertion in the limits of single transaction without increasing max
heap size simultaneously?

Thank in advance.

Regards,
Peter