RE: What criteria does TopLink use to 'cache' entities to avoiding re-executing SQL to find an object?

From: Gordon Yorke <gordon.yorke_at_oracle.com>
Date: Tue, 16 Jan 2007 16:17:01 -0500

Hello Martin,
    What you are describing appears to be a bug. TopLink does load those objects into the Persistence Context and should be retrieving them through the 'checkEarlyReturn' call. Once an object has been retrieved into a persistence context it should not need to be loaded from the database again. You mention that Spring's transaction manager is calling UnitOfWork.beginEarlyTransaction directly, are you using JPA in Spring or the TopLink persistence mechanism? Please enter a bug for this in glassfish. If you can provide a simple testcase it would be appreciated.
--Gordon

-----Original Message-----
From: Martin Bayly [mailto:mbayly_at_telus.net]
Sent: Tuesday, January 16, 2007 4:00 PM
To: persistence_at_glassfish.dev.java.net
Subject: Re: What criteria does TopLink use to 'cache' entities to
avoiding re-executing SQL to find an object?

Hi

I'm still trying to understand why my queries are performing so badly.
I'm getting a bit further in understanding what's going on but I'm still
a little perplexed at the way TopLink is behaving. But that is possibly
just my naivety.

As I mentioned in earlier posts I'm trying to use the Spring JPA TopLink
integration for running a JPA app in a Java SE environment. I'm finding
that queries that return objects that contain ManyToOne relationships
with other objects, query for each instance of the ManyToOne related
objects multiple times.

For example a MailMessage entity has a ManyToOne relationship with the
user entity who created the message.

If I query for all mail messages sent by user 1, I see a query to load
user1 being executed for every single mail message returned. So if user
1 sent 100 mail messages, the query to load user1's mail messages will
execute 100 individual queries for the user 1 entity.

Not good - add in a few more relationships like a mail having a
OneToMany relationship to a MailRecipient entity, and each mail
recipient having a bi-directional ManyToOne relationship to the mail
message (which is required by TopLink for a OneToMany relationship
without join table), each mail recipient having a ManyToOne relationship
with the user who the mail was sent to etc. and the number of individual
select statements being executed grows out of control.

What seems to be contributing to this issue is the fact that I am using
Spring's transactional demarcation e.g. @Transactional. Looking at the
Spring code, it looks like Spring is setup to start early transactions
in TopLink (Spring's transaction manager calls
UnitOfWork.beginEarlyTransaction).

This is what seems to be stopping TopLink from caching any of the query
results. If I change my queries to use a ReadOnly Spring transaction
for the query then Spring doesn't start an early transaction and TopLink
caches much more information and things perform much better.

However, I'm still concerned. What would happen if we had a non-read
only transactional method that needed to query for lots of information
up front. We would be back to the same issue. I guess what I don't
really understand is why TopLink does not cache returned objects in some
kind of 'Unit of Work' cache during a query even if the query is
executed in the context of a transactional method. I understand why you
wouldn't want to write to the SessionCache in a transactional method
because you could be writing un-committed data. However, why can you
not load a retrieved object into a transactional unit of work cache to
avoid constantly hitting the database for the same object as in my
example above?

I did quite a bit of stepping through the TopLink code last couple of
days and it seems that in the scenario I'm describing TopLink does cache
the dependent objects in a UnitOfWork cache, but when it's asked for
them again later in the same UnitOfWork it skips checking the
UnitOfWorkCache.

From ObjectLevelReadQuery.checkEarlyReturn:

        // The cache check must happen on the UnitOfWork in these cases
either
        // to access transient state or for pessimistic locking, as only
the
        // UOW knows which objects it has locked.
        if (shouldCheckCacheOnly() || shouldConformResultsInUnitOfWork()
|| getDescriptor().shouldAlwaysConformResultsInUnitOfWork() ||
(getLockMode() != ObjectBuildingQuery.NO_LOCK)) {
            Object result = checkEarlyReturnImpl(unitOfWork,
translationRow);
            if (result != null) {
                return result;
            }
        }

It then goes ahead and re-executes the select to load the object, but
when building the object later in
ObjectBuilder.buildWorkingCopyCloneFromRow, it finds the object in the
UnitOfWork identity map and throws away the retrieved database row anyway.

Quite possibly there is a lot more to this than I am seeing. Or maybe I
should be using a slightly different configuration to make those tests
in ObjectLevelReadQuery.checkEarlyReturn allow it to check the
UnitOfWork for an existing object before executing the query? Although
looking at the implementation of that method, it still doesn't seem to
check the UnitOfWork's cache?

Really hoping someone can provide me some input on this.

Thanks
Martin

Martin Bayly wrote:
> What criteria does TopLink use to determine that an existing cached
> instance of a required object can be re-used rather than executing an
> SQL query to load the object?
>
> I'm seeing output like this under one particular setup of my
> application code:
>
> [TopLink Finest]: 2007.01.15
> 12:01:01.515--UnitOfWork(9613092)--Thread(Thread[main,5,main])--Register
> the existing object com.toplink.test.MailRecipient_at_29
> [TopLink Finest]: 2007.01.15
> 12:01:01.515--UnitOfWork(9613092)--Thread(Thread[main,5,main])--Register
> the existing object com.toplink.test.MailMessage_at_91e90d7
> [TopLink Finest]: 2007.01.15
> 12:01:01.515--UnitOfWork(9613092)--Thread(Thread[main,5,main])--Register
> the existing object com.toplink.test.User_at_6a68de7
>
> This is good because it means it's not continuously hitting the
> database for data loaded by the same query or an earlier query. Does
> TopLink have a cache per EntityManagerFactory - I think I remember
> seeing a reference to this in an earlier post.
>
> This means that if I execute code like the following
> - create an EntityManager 1,
> - load an object User A, - here I see a query against the database
> - close the EntityManager 1,
> - open a new EntityManager 2 from the same factory,
> - request the object User A again
>
> ... I see one of the above messages and we don't hit the database again?
>
> However, in another setup of my application code, we are trying to use
> Spring JPA with TopLink Essentials. With Spring in the picture, it
> seems like I don't see the above type of message. Instead, we are
> continuously hitting the database even for a referenced object that is
> utilized by many rows in the result set of a single query. Actually,
> I see the above messages for data created and retrieved by the same
> EntityManagerFactory. But as soon as I close down the app, restart and
> hit the same data, it never loads from cache, no matter how many times
> we query for the data.
>
> e.g. querying mail messages for user 1. Each mail message has an
> embedded member User which should have value user1 for the results of
> the query. We see a query for user 1 being executed against the
> database for every mail message returned and this is killing
> performance. This is a general problem for all entities when using
> Spring JPA.
>
> Thanks for any insights you can give.
> I'm not saying this is a TopLink bug but it seems that the combination
> of TopLink and Spring JPA in a Java SE environment does not play nicely.
>
> Using Spring 2.0 and TopLink v2 b26.
> I've tried this with the latest build of TopLink (v2 b31) and get same
> problem.
>
> Cheers
> Martin
>