[javaee-spec users] JPA 2.1: Enhance per-query and per-property control over fetch eagerness, fetch mode, fetch groups

From: Craig Ringer <craig_at_postnewspapers.com.au>
Date: Tue, 26 Jun 2012 10:45:04 +0800

To the JPA spec team and the broader EE working group:

I've been seeing increasing evidence on user-facing forums and mailing
lists that control over fetching via JPA is a real challenge for
developers. It's certainly been a huge one for me. I'm interested in
whether this can be improved for JPA 2.1 and Java EE 7, as in my view
the fetch issues are a big pain point.

I'm writing to raise this with the JPA 2.1 spec team, as I don't see any
enhancements regarding fetch strategies and modes in the latest draft
and didn't spot discussion of it on the list. I'd like to strike up a
discussion about what, if anything, can/should be done about this for
Java EE 7.

        What do apps need to do?

In JPA 2.0 there's solid control over lazy vs eager fetching on an
entity/property/relationship basis using the usual @...ToOne /
@...ToMany (fetch=FetchType.[LAZY|EAGER]) annotations and the orm.xml
equivalents. This works well, but is too simple for many projects.

From my admittedly rather limited experience, and from discussions I've
seen, it seems common to have widely referenced entities that you don't
want to eagerly load the relationships of most of the time, but *need*
them loaded in some situations. Commonly this is because you'll be using
them detached from an entity manager context and know you'll need access
to normally lazily loaded properties. Sometimes it's a performance issue
where you can't afford the expense of lots of little database hits as
proxied lazily loaded properties are loaded.

        What's currently possible?

Right now, my understanding - and I don't claim it's a great one - is
exactly one option to override normally lazy fetching with standard JPA:
use a left join fetch, either in JPQL or via Criteria API. That's OK
much of the time.

        What's wrong with the current situation?

Being limited to a "left join fetch" can also be really problematic:

  * There's no way to ask the provider to use a different fetching
    strategy, like a follow-up batched SELECT, or use subselect fetching.

  * A left join fetch is fine when you're eagerly fetching one or two
    lazily fetched entity relationships. It scales extremely poorly if
    you have several things to fetch and/or more than one level, eg "a.b.c".

  * Apps sometimes need to do extra JPQL / criteria queries and repeat
    work in order to load required entities into the persistence context
    without expensive multiple joins.

The key problem in my view is that the JPA API doesn't give the user any
way to ask for normally-lazy relationships to be eagerly fetched without
also forcing them to be fetched /in a single SQL query/. That can be
really sub-optimal, and it conflates joins (a matter of query logic)
with fetching (a matter of what's retrieved). You can't say "fetch x.y
in whatever way is optimal".

I've seen numerous recommendations, especially on the Vaadin lists and
around Swing apps, to use EclipseLink and allow it to lazily load
properties of detached entities using proxies. This is a /nasty/ thing
for people to be relying on, as (a) each load is a query, so it's the
ultimate in n+1 or worse with nested properties; and (b) those later
loads are generally in new transactions, breaking the DB's consistency
guarantees in ways optimistic locking often can't help with. That people
are having to rely on this is IMO of concern.

It doesn't help that the Root<T>.fetch(...) API is difficult to use
correctly and has been acknowledged to be poorly specified. It's easy to
land up doing a second unnecessary join, or to get a " query specified
join fetching, but the owner of the fetched association was not present
in the select list" error. This article used to talk about it:

http://blogs.sun.com/ldemichiel/entry/jpa_next_thinking_about_the#comment-1291653518000

but has since been devoured by the Oracle transition.

        What can be done via implementation-specific extensions?

Some JPA implementations offer fetch controls via extensions, but
there's nothing consistently available.

EclipseLink gives quite good fetching control via JPA query hints,
allowing default fetch modes to be overridden on a per-property basis
and allowing the specification of alternative fetch strategies. It also
supports lazy loading of properties in detached entities, which has
several problems as mentioned above.

Hibernate, as far as I've been able to determine, doesn't expose
anything equivalent at the JPA level. It has setFetchMode(...) in its
own Criteria API, but as far as I've been able to find out it doesn't
expose that to JPA via hints or other mechanisms. I'm frequently told
that Hibernate is best suited for short-transaction stateless
applications because it doesn't lazy load on detached entities -
presumably because it's too hard to specify what you want eagerly loaded.

I'm not sufficiently familiar with other implementations to say what
they offer.

        What's needed?

In my view, the key thing is that JPA needs to do is provide join mode
and strategy controls at a per-query, per-relationship level without
requiring a left join fetch. I'd be interested in what your thoughts are.

          Per-query, per-property overrides for eager vs lazy fetching

Clients need to be able to specify to the ORM that a given property
should be eagerly or lazily fetched in a particular query. An API that
avoids the need for providers to have to parse free-form properties (and
is thus more checkable) would be good, so adding something like:

   CriteriaQuery.setFetchMode(String propertyName, FetchType fetchType)

would seem ideal to me, where "propertyName" can be a dot-path to
sub-properties, or of course a metamodel object/path.

Different fetch strategies are supported by different implementations,
and I don't think the JPA spec can really specify a complete set of
possible strategies, so the fetch mode type should probably be a simple
EAGER | LAZY enum, handily already provided by
javax.persistence.FetchType . The implementer should be free to choose
the most appropriate fetch method, so long as properties marked EAGER
are in fact attached to the persistence context when the query completes.

          Per-query, per-property control over fetch strategies

IMO if explicit specification of fetch strategy is provided though the
JPA API (which would be nice) it should be by string names for
strategies, or at least allow them. There's no predicting what fetch
strategies will be possible. For example, with PostgreSQL's new JSON
data type support it's possible to do an eager fetch of a relationship
using a join or subquery with query_to_json, using array_agg and
array_to_json, or using record_to_json. The ORM no longer needs to
de-duplicate a cross product. Standardizing this would be nuts, but a
way to ask an ORM that's aware of it to use it makes sense. I'd like to
see something like:

     CriteriaQuery.setFetchStrategy(String propertyName, String strategy)

... and maybe ...

     CriteriaQuery.setFetchStategy(String propertyName, FetchStrategy
strategy)

... with FetchStrategy being an enum { JOIN, SELECT, SUBSELECT, ANY } ,
as those are the widely recognised strategies plus one that lets the
implementation choose (default for FetchType.EAGER).

          Fetch groups?

It may also be worth thinking about another often-sought-after facility,
fetch groups, but IMO control over fetch mode and strategy on a
per-query, per-relationship level is much more important.

BTW, I wrote a bit about this earlier here:
http://blog.ringerc.id.au/2012/06/jpa2-is-very-inflexible-with-eagerlazy.html

--
Craig Ringer
POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088     Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/