Refreshing all instances of an Entity in Toplink Essentials

From: <glassfish_at_javadesktop.org>
Date: Tue, 03 Jun 2008 16:25:44 PDT

So, we've contrived a technique that can be used to force the JPA to reload all instances of a specific class.

I'd was hoping a TopLink expert would comment on the technique, so that perhaps it could be refined.

The premise started with the fact that you can execute EQL based update statements.

My thinking was that the JPA query builder had to be mildly clever, but not spectacularly so.

The challenge for the JPA was keeping the cached instances in sync with the DB after executing the update.

The naive implementation would simply invalidate the cache for the class affected by the Update statement. By invalidating the cache, it forces the JPA to reload the instances from the database. And the reason for doing this is that the JPA "doesn't know" what changes were made to the rows in the DB, just that there WERE changes.

However, with a little work, the JPA could be "smart enough" to not invalidate EVERY instance, but only those that were affected by the update statement, as given by the where clause on the EQL query.

For example, "update User set firstName='Fred' where userid = 1". The JPA could readily figure out the a) userid is the primary key, and b) we're changing the instance with a user id of '1'. It could use that information to invalidate that specific instance in its cache, rather than wholesale invalidation of everything in the cache for the entity User.

So, the trick was to find a) a query exotic enough for the JPA to "give up" and simply invalidate all cache instances, yet, b) have effectively no impact on the database.

What I came up with is:

update Entity set primaryKey = primaryKey where primaryKey is null

This seems to work for as, as a generic pattern.

On the database side, "where primaryKey is null" is ALMOST a free operation. Simply by the fact that most any table has their primary key flagged as not null (so there's not actual rows with a null primary key) and most any table with a primary key has the primary key indexed, so at most the DB should do a single index hit that returns 0 rows.

But, ideally, the JPA "doesn't know that". It somehow didn't make the leap that we're asking for null values on a not null column, it didn't conflate the "is null" operation similar to the "=" operation (even though it's pretty much the same thing, save that it operates on null).

At a minimum, with Toplink Essentials on GFv2, this had the effect of invalidating all of the cached instances.

There is no formal way to do this, hopefully the full boat Toplink for JPA 2 will give us this capability. I'm also curious if this works for other JPA providers.

And I'd enjoy hearing from one of the Toplink folks about how this actually works, and whether what I've done is a "bad idea" or a reasonable workaround to the problem.

The drive behind needing this, for us, was we had a Many to Many relationship, and were deleting one side of it. With cascades on the Database, the DB was deleting the rows from the join table. But within the JPA, the other side of the Many to Many "didn't know about the deletion", so it maintained the relationship for cached entities.

What we SHOULD have done is pull in the entities on the other side, and remove the other side of the relationship.

But with a large relationship, that's a really costly operation. Instead, we used this technique to invalidate the cache for any instances of the other side of the Many to Many, and then the relationship refreshed properly from the database, without us having to load up every row from the DB to delete the reference.

Anyway, FYI, thoughts appreciated.
[Message sent by forum member 'whartung' (whartung)]

http://forums.java.net/jive/thread.jspa?messageID=278143