jsr338-experts@jpa-spec.java.net

[jsr338-experts] outer joins with ON conditions

From: Linda DeMichiel <linda.demichiel_at_oracle.com>
Date: Fri, 11 Mar 2011 13:30:27 -0800

Outer joins with ON conditions have been requested a number of times.

The issue is that the current JPQL and criteria API syntax do not
support capturing the semantics that SQL provides.

Chapter 4 of the JPA spec states that outer joins "enable the
retrieval of a set of entities where matching values in the join
condition may be absent."

The issue appears with the semantic difference between restrictions
applied in the WHERE clause and in the SQL outer join ON clause.

The SQL semantics for the outer join

SELECT A.*, B.*
FROM A LEFT JOIN B
ON C1 AND C2 ... Cn

are the following:

   SELECT A.*, B.*
   FROM A, B
   WHERE C1 AND C2 ... AND Cn

   UNION ALL

   SELECT A.*, NULL, NULL, ..., NULL
   FROM A
   WHERE A.* NOT IN (SELECT A.*
                     FROM A, B
                     WHERE C1 AND C2 ... AND Cn )


This means that if an A tuple ai does not join with any B tuple according to
the condition C1 AND C2 ... AND Cn, then the tuple [ai.*, NULL, ....,
NULL] *will* appear in the query result (even if some predicate condition Cj
applies only to the A columns of A and is false for ai).


To illustrate more concretely, the following example was posted
in a request on my blog (thanks Bernard, if you are reading!)


       Let's say you want a list of car dealers and how many new
        vehicles they're selling.

       In SQL you can say:

           SELECT d.name, count( v.id )
           FROM dealer d LEFT OUTER JOIN vehicle v
           ON v.dealer_id = d.dealer_id AND v.type = 'New'
           GROUP BY d.name

      In JPQL, you only have the WHERE clause which limits your results,
      making the outer join useless:

           SELECT d.name, count( v.id )
           FROM dealer d LEFT OUTER JOIN d.vehicleList v
           WHERE v.type = 'New'
           GROUP BY d.name


Were we to include support for the "ON" syntax and semantics,
the JPQL equivalents would be the following:

SELECT d.name, count(v.id)
FROM dealer d LEFT OUTER JOIN d.vehicle v
ON v.type = 'New'
GROUP BY d.name

and

SELECT d.name, count(v.id)
FROM dealer d LEFT OUTER JOIN d.vehicle v
WHERE v.type = 'New'
GROUP BY d.name"

In the first of these queries, all dealers are included in the result,
including dealers with no vehicles that qualify.

In the second, we're only looking at dealers that have at least one
new vehicle.


I propose that we should add support for the JPQL syntax

LEFT [OUTER] JOIN join_association_path_expression [AS] identification_variable
    [ON conditional_expression]

with the semantics that:

SELECT a, b.x1, b.x2, ... bxn
FROM A a LEFT JOIN a.b b
   ON c1...cn

is equivalent to the semantics of

SELECT a, b.x1, b.x2, ... bxn
A a JOIN a.b b
   WHERE c1...cn

unioned with

SELECT a, null1, null2, ... nulln
FROM A a
WHERE a NOT IN (SELECT a' FROM A a' JOIN B b'
                 WHERE c1...cn)


For the criteria API equivalent, we could add methods

      on(Expression<Boolean> restriction);
and

      on(Predicate... restriction);

to the Join interface.

I'm not however sure what it is best that these methods return.

To be complete, we'd also need the method

      Predicate getOn();

on the Join interface as well.


Finally, we should also discuss whether we should support the use of
ON with inner joins. SQL supports this, with the semantics that it is
equivalent to the use of WHERE.


Opinions??

thanks,

-Linda