persistence@glassfish.java.net

Re: Getting out of memory exception during bulk insert

From: Peter Melnikov <peter.melnikov_at_gmail.com>
Date: Tue, 27 Mar 2007 17:38:27 +0300

Hello

Thank you both for the deep explanation I will keep it in mind when will be
developing my applications. I agree that my test is not common for most ORM
use cases.

Peter

On 3/27/07, Gordon Yorke <gordon.yorke_at_oracle.com> wrote:
>
> Hello Peter,
>
> Wonseok has given a sound explanation. Caching provides a benefit for
> a vast majority of users but TopLinks caching mechanism requires tracking of
> the objects in the current transaction to ensure the cache contains current
> information. There is currently no JPA defined mechanism to notify the
> Persistence Provider that a transaction will contain only inserts so TopLink
> has to assume that the transaction will include other forms of data
> modifications. The only way to prevent the out of memory is to keep the
> transactions moderate in size.
> --Gordon
>
> -----Original Message-----
> *From:* Wonseok Kim [mailto:guruwons_at_gmail.com]
> *Sent:* Monday, March 26, 2007 9:48 PM
> *To:* persistence_at_glassfish.dev.java.net
> *Subject:* Re: Getting out of memory exception during bulk insert
>
> Hi, Peter
> I tried to give some explanation, but I think Oracle team can give more
> accurate answer later.
> Comments in line...
>
> On 3/27/07, Peter Melnikov <peter.melnikov_at_gmail.com> wrote:
> >
> > Hi All,
> >
> > I hope this is a right place to ask. I am new to JPA technology and ORM
> > in general. I wrote a simple application to play around JPA and got some
> > issues during bulk insert of [not so] large dataset. I understand that there
> > are more appropriate ways in production system to import large dataset into
> > a database. But I want to know the limitations of JPA and other ORM before I
> > use it in my future applications.
> >
> > I use glassfish-persistence-v2-b38 downloaded from oracle site recently,
> > Spring 2.0.3, MySQL 4.0.13 and Java 6.
> >
> > I have created a business service class with the single method:
> >
> > @Transactional
> > public void createData() {
> > for(int i = 0; i < 5000; i++) {
> > Address address = new AddressImpl("USA", "New York",
> > "Address Line 1", "10116");
> > Company company = new CompanyImpl("COMPANY" + i, 343244, "
> > contact_at_company.com", address);
> > dao.create(company);
> > if((i+1)%1000 == 0) {
> > dao.getEntityManager().flush();
> > dao.getEntityManager().clear();
> > }
> > }
> > }
> >
> > My generic dao class implements create as following:
> >
> > public T create(T entity) {
> > em.persist(entity);
> > return entity;
> > }
> >
> > EntityManager is injected via spring
> > PersistenceAnnotationBeanPostProcessor. Nothing too sophisticated indeed.
> >
> > When I call the createData method for 20-40k iterations in the loop I
> > get OutOfMemoryException some were in the middle of the transaction. I
> > though that clear method of EntityManager clears context fully from
> > managed entities but it seems not a case in my situation.
> >
> > I have found out that member variable: UnitOfWorkChangeSet
> > cumulativeUOWChangeSet; declared in class RepetableWriteUnitOfWork not
> > cleared during the call of clear method in the middle of a transaction
> > (only member variables in UnitOfWorkImpl superclass are cleared). So, it
> > keeps object references in hashtables though entire transaction thus blowing
> > up the heap space as more new objects are getting persisted. Is it by design
> > of JPA or just TopLink Essentials implementation specifics? Is there any way
> > to do bulk insertion in the limits of single transaction without increasing
> > max heap size simultaneously?
>
>
> Your observation is right. cumulativeUOWChangeSet is used to merge the
> changes in one transaction to the shared cache. In this case transaction
> lasts long and contains lots of objects, thus heap size continuously
> increases. This is not design flaw of JPA, but the cache implementation
> policy of TopLink.
>
> With current implementation, I could not find a way to avoid this in one
> transaction. Commiting transactions in the middle seems the only way to work
> around. Or you can use native query to avoid caching, but I don't think this
> should be recommended.
>
> Other TopLink team members could tell more about this...
>
> Thank in advance.
> >
> > Regards,
> > Peter
> >
> >
> >
> >
> >
> >
> >
>