persistence@glassfish.java.net

RE: Getting out of memory exception during bulk insert

From: Gordon Yorke <gordon.yorke_at_oracle.com>
Date: Tue, 27 Mar 2007 10:31:04 -0400

Hello Peter,

    Wonseok has given a sound explanation. Caching provides a benefit for a vast majority of users but TopLinks caching mechanism requires tracking of the objects in the current transaction to ensure the cache contains current information. There is currently no JPA defined mechanism to notify the Persistence Provider that a transaction will contain only inserts so TopLink has to assume that the transaction will include other forms of data modifications. The only way to prevent the out of memory is to keep the transactions moderate in size.

--Gordon
  -----Original Message-----
  From: Wonseok Kim [mailto:guruwons_at_gmail.com]
  Sent: Monday, March 26, 2007 9:48 PM
  To: persistence_at_glassfish.dev.java.net
  Subject: Re: Getting out of memory exception during bulk insert


  Hi, Peter
  I tried to give some explanation, but I think Oracle team can give more accurate answer later.
  Comments in line...


  On 3/27/07, Peter Melnikov <peter.melnikov_at_gmail.com> wrote:
    Hi All,

    I hope this is a right place to ask. I am new to JPA technology and ORM in general. I wrote a simple application to play around JPA and got some issues during bulk insert of [not so] large dataset. I understand that there are more appropriate ways in production system to import large dataset into a database. But I want to know the limitations of JPA and other ORM before I use it in my future applications.

    I use glassfish-persistence-v2-b38 downloaded from oracle site recently, Spring 2.0.3, MySQL 4.0.13 and Java 6.

    I have created a business service class with the single method:

    @Transactional
    public void createData() {
            for(int i = 0; i < 5000; i++) {
                Address address = new AddressImpl("USA", "New York", "Address Line 1", "10116");
                Company company = new CompanyImpl("COMPANY" + i, 343244, " contact_at_company.com", address);
                dao.create(company);
                if((i+1)%1000 == 0) {
                    dao.getEntityManager().flush();
                    dao.getEntityManager().clear();
                }
            }
    }

    My generic dao class implements create as following:

    public T create(T entity) {
        em.persist(entity);
        return entity;
    }

    EntityManager is injected via spring PersistenceAnnotationBeanPostProcessor. Nothing too sophisticated indeed.

    When I call the createData method for 20-40k iterations in the loop I get OutOfMemoryException some were in the middle of the transaction. I though that clear method of EntityManager clears context fully from managed entities but it seems not a case in my situation.

    I have found out that member variable: UnitOfWorkChangeSet cumulativeUOWChangeSet; declared in class RepetableWriteUnitOfWork not cleared during the call of clear method in the middle of a transaction (only member variables in UnitOfWorkImpl superclass are cleared). So, it keeps object references in hashtables though entire transaction thus blowing up the heap space as more new objects are getting persisted. Is it by design of JPA or just TopLink Essentials implementation specifics? Is there any way to do bulk insertion in the limits of single transaction without increasing max heap size simultaneously?

  Your observation is right. cumulativeUOWChangeSet is used to merge the changes in one transaction to the shared cache. In this case transaction lasts long and contains lots of objects, thus heap size continuously increases. This is not design flaw of JPA, but the cache implementation policy of TopLink.

  With current implementation, I could not find a way to avoid this in one transaction. Commiting transactions in the middle seems the only way to work around. Or you can use native query to avoid caching, but I don't think this should be recommended.

  Other TopLink team members could tell more about this...



    Thank in advance.

    Regards,
    Peter