Hi, Peter
I tried to give some explanation, but I think Oracle team can give more
accurate answer later.
Comments in line...
On 3/27/07, Peter Melnikov <peter.melnikov_at_gmail.com> wrote:
>
> Hi All,
>
> I hope this is a right place to ask. I am new to JPA technology and ORM in
> general. I wrote a simple application to play around JPA and got some issues
> during bulk insert of [not so] large dataset. I understand that there are
> more appropriate ways in production system to import large dataset into a
> database. But I want to know the limitations of JPA and other ORM before I
> use it in my future applications.
>
> I use glassfish-persistence-v2-b38 downloaded from oracle site recently,
> Spring 2.0.3, MySQL 4.0.13 and Java 6.
>
> I have created a business service class with the single method:
>
> @Transactional
> public void createData() {
> for(int i = 0; i < 5000; i++) {
> Address address = new AddressImpl("USA", "New York", "Address
> Line 1", "10116");
> Company company = new CompanyImpl("COMPANY" + i, 343244, "
> contact_at_company.com", address);
> dao.create(company);
> if((i+1)%1000 == 0) {
> dao.getEntityManager().flush();
> dao.getEntityManager().clear();
> }
> }
> }
>
> My generic dao class implements create as following:
>
> public T create(T entity) {
> em.persist(entity);
> return entity;
> }
>
> EntityManager is injected via spring
> PersistenceAnnotationBeanPostProcessor. Nothing too sophisticated indeed.
>
> When I call the createData method for 20-40k iterations in the loop I get
> OutOfMemoryException some were in the middle of the transaction. I though
> that clear method of EntityManager clears context fully from managed
> entities but it seems not a case in my situation.
>
> I have found out that member variable: UnitOfWorkChangeSet
> cumulativeUOWChangeSet; declared in class RepetableWriteUnitOfWork not
> cleared during the call of clear method in the middle of a transaction
> (only member variables in UnitOfWorkImpl superclass are cleared). So, it
> keeps object references in hashtables though entire transaction thus blowing
> up the heap space as more new objects are getting persisted. Is it by design
> of JPA or just TopLink Essentials implementation specifics? Is there any way
> to do bulk insertion in the limits of single transaction without increasing
> max heap size simultaneously?
Your observation is right. cumulativeUOWChangeSet is used to merge the
changes in one transaction to the shared cache. In this case transaction
lasts long and contains lots of objects, thus heap size continuously
increases. This is not design flaw of JPA, but the cache implementation
policy of TopLink.
With current implementation, I could not find a way to avoid this in one
transaction. Commiting transactions in the middle seems the only way to work
around. Or you can use native query to avoid caching, but I don't think this
should be recommended.
Other TopLink team members could tell more about this...
Thank in advance.
>
> Regards,
> Peter
>
>
>
>
>
>
>