Re: What criteria does TopLink use to 'cache' entities to avoiding re-executing SQL to find an object?

From: Martin Bayly <mbayly_at_telus.net>
Date: Tue, 16 Jan 2007 13:35:40 -0800

Thanks Gordon

I'll log a bug and provide a simple example that demonstrates it. I can
reproduce the issue with just plain TopLink JPA without Spring if I
start a JPA transaction, update an object to force TopLink to start a
real transaction, and then execute a query in the same transaction.

Wasn't sure exactly what you meant by 'are you using JPA in Spring or
the TopLink persistence mechanism'?

Just to clarify I'm using Java SE, with Spring JPA and TopLink
Essentials (v2b26 - although have tried b31 and problem still exists) as
the JPA persistence provider. So I'm using Spring to manage the
EntityManagerFactory, the transactions and to inject the EntityManager
into my DAOs.

However, as I mentioned, I can reproduce the issue even without Spring,
once I write an object in a JPA transaction and then execute a query.
I'll show this scenario in the sample code I post with the bug.

Cheers
Martin

Gordon Yorke wrote:
> Hello Martin,
> What you are describing appears to be a bug. TopLink does load those objects into the Persistence Context and should be retrieving them through the 'checkEarlyReturn' call. Once an object has been retrieved into a persistence context it should not need to be loaded from the database again. You mention that Spring's transaction manager is calling UnitOfWork.beginEarlyTransaction directly, are you using JPA in Spring or the TopLink persistence mechanism? Please enter a bug for this in glassfish. If you can provide a simple testcase it would be appreciated.
> --Gordon
>
> -----Original Message-----
> From: Martin Bayly [mailto:mbayly_at_telus.net]
> Sent: Tuesday, January 16, 2007 4:00 PM
> To: persistence_at_glassfish.dev.java.net
> Subject: Re: What criteria does TopLink use to 'cache' entities to
> avoiding re-executing SQL to find an object?
>
>
> Hi
>
> I'm still trying to understand why my queries are performing so badly.
> I'm getting a bit further in understanding what's going on but I'm still
> a little perplexed at the way TopLink is behaving. But that is possibly
> just my naivety.
>
> As I mentioned in earlier posts I'm trying to use the Spring JPA TopLink
> integration for running a JPA app in a Java SE environment. I'm finding
> that queries that return objects that contain ManyToOne relationships
> with other objects, query for each instance of the ManyToOne related
> objects multiple times.
>
> For example a MailMessage entity has a ManyToOne relationship with the
> user entity who created the message.
>
> If I query for all mail messages sent by user 1, I see a query to load
> user1 being executed for every single mail message returned. So if user
> 1 sent 100 mail messages, the query to load user1's mail messages will
> execute 100 individual queries for the user 1 entity.
>
> Not good - add in a few more relationships like a mail having a
> OneToMany relationship to a MailRecipient entity, and each mail
> recipient having a bi-directional ManyToOne relationship to the mail
> message (which is required by TopLink for a OneToMany relationship
> without join table), each mail recipient having a ManyToOne relationship
> with the user who the mail was sent to etc. and the number of individual
> select statements being executed grows out of control.
>
> What seems to be contributing to this issue is the fact that I am using
> Spring's transactional demarcation e.g. @Transactional. Looking at the
> Spring code, it looks like Spring is setup to start early transactions
> in TopLink (Spring's transaction manager calls
> UnitOfWork.beginEarlyTransaction).
>
> This is what seems to be stopping TopLink from caching any of the query
> results. If I change my queries to use a ReadOnly Spring transaction
> for the query then Spring doesn't start an early transaction and TopLink
> caches much more information and things perform much better.
>
> However, I'm still concerned. What would happen if we had a non-read
> only transactional method that needed to query for lots of information
> up front. We would be back to the same issue. I guess what I don't
> really understand is why TopLink does not cache returned objects in some
> kind of 'Unit of Work' cache during a query even if the query is
> executed in the context of a transactional method. I understand why you
> wouldn't want to write to the SessionCache in a transactional method
> because you could be writing un-committed data. However, why can you
> not load a retrieved object into a transactional unit of work cache to
> avoid constantly hitting the database for the same object as in my
> example above?
>
> I did quite a bit of stepping through the TopLink code last couple of
> days and it seems that in the scenario I'm describing TopLink does cache
> the dependent objects in a UnitOfWork cache, but when it's asked for
> them again later in the same UnitOfWork it skips checking the
> UnitOfWorkCache.
>
> From ObjectLevelReadQuery.checkEarlyReturn:
>
> // The cache check must happen on the UnitOfWork in these cases
> either
> // to access transient state or for pessimistic locking, as only
> the
> // UOW knows which objects it has locked.
> if (shouldCheckCacheOnly() || shouldConformResultsInUnitOfWork()
> || getDescriptor().shouldAlwaysConformResultsInUnitOfWork() ||
> (getLockMode() != ObjectBuildingQuery.NO_LOCK)) {
> Object result = checkEarlyReturnImpl(unitOfWork,
> translationRow);
> if (result != null) {
> return result;
> }
> }
>
> It then goes ahead and re-executes the select to load the object, but
> when building the object later in
> ObjectBuilder.buildWorkingCopyCloneFromRow, it finds the object in the
> UnitOfWork identity map and throws away the retrieved database row anyway.
>
> Quite possibly there is a lot more to this than I am seeing. Or maybe I
> should be using a slightly different configuration to make those tests
> in ObjectLevelReadQuery.checkEarlyReturn allow it to check the
> UnitOfWork for an existing object before executing the query? Although
> looking at the implementation of that method, it still doesn't seem to
> check the UnitOfWork's cache?
>
> Really hoping someone can provide me some input on this.
>
> Thanks
> Martin
>
>
> Martin Bayly wrote:
>
>> What criteria does TopLink use to determine that an existing cached
>> instance of a required object can be re-used rather than executing an
>> SQL query to load the object?
>>
>> I'm seeing output like this under one particular setup of my
>> application code:
>>
>> [TopLink Finest]: 2007.01.15
>> 12:01:01.515--UnitOfWork(9613092)--Thread(Thread[main,5,main])--Register
>> the existing object com.toplink.test.MailRecipient_at_29
>> [TopLink Finest]: 2007.01.15
>> 12:01:01.515--UnitOfWork(9613092)--Thread(Thread[main,5,main])--Register
>> the existing object com.toplink.test.MailMessage_at_91e90d7
>> [TopLink Finest]: 2007.01.15
>> 12:01:01.515--UnitOfWork(9613092)--Thread(Thread[main,5,main])--Register
>> the existing object com.toplink.test.User_at_6a68de7
>>
>> This is good because it means it's not continuously hitting the
>> database for data loaded by the same query or an earlier query. Does
>> TopLink have a cache per EntityManagerFactory - I think I remember
>> seeing a reference to this in an earlier post.
>>
>> This means that if I execute code like the following
>> - create an EntityManager 1,
>> - load an object User A, - here I see a query against the database
>> - close the EntityManager 1,
>> - open a new EntityManager 2 from the same factory,
>> - request the object User A again
>>
>> ... I see one of the above messages and we don't hit the database again?
>>
>> However, in another setup of my application code, we are trying to use
>> Spring JPA with TopLink Essentials. With Spring in the picture, it
>> seems like I don't see the above type of message. Instead, we are
>> continuously hitting the database even for a referenced object that is
>> utilized by many rows in the result set of a single query. Actually,
>> I see the above messages for data created and retrieved by the same
>> EntityManagerFactory. But as soon as I close down the app, restart and
>> hit the same data, it never loads from cache, no matter how many times
>> we query for the data.
>>
>> e.g. querying mail messages for user 1. Each mail message has an
>> embedded member User which should have value user1 for the results of
>> the query. We see a query for user 1 being executed against the
>> database for every mail message returned and this is killing
>> performance. This is a general problem for all entities when using
>> Spring JPA.
>>
>> Thanks for any insights you can give.
>> I'm not saying this is a TopLink bug but it seems that the combination
>> of TopLink and Spring JPA in a Java SE environment does not play nicely.
>>
>> Using Spring 2.0 and TopLink v2 b26.
>> I've tried this with the latest build of TopLink (v2 b31) and get same
>> problem.
>>
>> Cheers
>> Martin
>>
>>
>
>