users@jpa-spec.java.net

[jpa-spec users] Re: JPA_SPEC-99: Add ability to stream the result of a query execution

From: Oliver Gierke <ogierke_at_pivotal.io>
Date: Fri, 5 May 2017 08:58:55 +0200

Hi all,

I like Lukas suggestion as it falls back to standard list based access but allows providers to gradually add optimized, really streaming implementations (Hibernate already having an implementation in place). So I think it's a nice enablement feature, that prevents users from having to wait for quite some time to actually get that. Plus, it's very much in line with the overall Java 8 theme of the umbrella.

In Spring Data we've already had support for Streams for quite a while. Starting with our own clunky version of a CloseableIterator over provider specific APIs, we're now looking into gradually removing that as we now upgrade our Hibernate baseline to a version that already includes the calls on native APIs. Getting such API on JPA types would allow us to remove *a lot* of custom code.

Two concerns I'd like to raise though, effectively feedback we got from exposing those APIs:

- When people call a method returning a Stream they seem to expect a lot of magic happening, i.e. a real stream of result elements. If the default implementation then just reads everything at once and basically calls List.stream(), some users might feel tricked. What I am trying to say is: the API promises a Stream. In the context of databases users then come with certain expectations and might be disappointed to hear they're not met by default. However, I totally see the appeal of being able to write code that uses streams and then gradually get better performance when a provider adds that feature.

- It took us quite a bit of time and resources to educate users about the specialties that particular Stream instance needs to be handled with: required usage in try-with-resources, the need to keep the underlying EntityManager and DataSource connection open *while the stream is consumed*. Especially the latter is not a too obvious problem superficially and we ended up even adding some strict guards [0] into our execution code that interacts with Spring's TX infrastructure. I guess something like that is not an option in JavaEE.

- As to stream() VS. getResultStream() — I am torn on this. I definitely see Steve's point here but one could also argue that JPA developers familiar with getResultList() would at least consider getResultStream() consistent within that API.

Cheers,
Ollie

[0] https://jira.spring.io/browse/DATAJPA-1023

> Am 05.05.2017 um 07:57 schrieb Guillermo González de Agüero <z06.guillermo_at_gmail.com>:
>
> Hi,
>
> Linda: in Hibernate it acts as a ScrollableResult that fetches data "on-demand". I imagine, in practice, every implementation will use something like that to provide performance benefits. This is a very valuable addition.
>
> Lukas: I think that JavaDoc could add an example of the possible things a vendor may do, like "persistence provider may choose to override this method to provide additional capabilities, *like cursor-like fetching*". That would make the purpose clearer.
>
>
> Regards,
>
> Guillermo González de Agüero.
>
> On Thu, May 4, 2017 at 11:35 PM, Linda DeMichiel <linda.demichiel_at_oracle.com> wrote:
> Hi,
>
> If the value is mostly in the provider-specific add-ons, then I
> think it would be better to wait and standardize in a later
> release.
>
> What did you have in mind here?
>
> -Linda
>
>
> On 5/4/17, 1:01 PM, Lukas Jungmann wrote:
> Hi,
>
> another change I'd like to propose is an addition of
> getResultStream:Stream method to javax.persistence.Query/TypedQuery
> interfaces with following javadoc and provided with default implementation:
>
> javax.persistence.Query:
>
> / /**//
> // * Execute a SELECT query and return the query results//
> // * as an untyped <code>java.util.Stream</code>.//
> // * By default this method delegates to
> <code>getResultList().stream()</code>,//
> // * however persistence provider may choose to override this method//
> // * to provide additional capabilities.//
> // *//
> // * @return a stream of the results//
> // * @throws IllegalStateException if called for a Java//
> // * Persistence query language UPDATE or DELETE statement//
> // * @throws QueryTimeoutException if the query execution exceeds//
> // * the query timeout value set and only the statement is//
> // * rolled back//
> // * @throws TransactionRequiredException if a lock mode other than//
> // * <code>NONE</code> has been set and there is no
> transaction//
> // * or the persistence context has not been joined to the
> transaction//
> // * @throws PessimisticLockException if pessimistic locking//
> // * fails and the transaction is rolled back//
> // * @throws LockTimeoutException if pessimistic locking//
> // * fails and only the statement is rolled back//
> // * @throws PersistenceException if the query execution exceeds //
> // * the query timeout value set and the transaction //
> // * is rolled back //
> // * @see Stream//
> // * @see #getResultList()//
> // * @since 2.2//
> // *///
> // default Stream getResultStream() {//
> // return getResultList().stream();//
> // }/
>
> similarly to javax.persistence.TypedQuery:
>
> / /**//
> // * Execute a SELECT query and return the query results//
> // * as a typed <code>java.util.Stream</code>.//
> // * By default this method delegates to
> <code>getResultList().stream()</code>,//
> // * however persistence provider may choose to override this method//
> // * to provide additional capabilities.//
> // *//
> // * @return a stream of the results//
> // * @throws IllegalStateException if called for a Java//
> // * Persistence query language UPDATE or DELETE statement//
> // * @throws QueryTimeoutException if the query execution exceeds//
> // * the query timeout value set and only the statement is//
> // * rolled back//
> // * @throws TransactionRequiredException if a lock mode other than//
> // * <code>NONE</code> has been set and there is no
> transaction//
> // * or the persistence context has not been joined to the
> transaction//
> // * @throws PessimisticLockException if pessimistic locking//
> // * fails and the transaction is rolled back//
> // * @throws LockTimeoutException if pessimistic locking//
> // * fails and only the statement is rolled back//
> // * @throws PersistenceException if the query execution exceeds //
> // * the query timeout value set and the transaction //
> // * is rolled back //
> // * @see Stream//
> // * @see #getResultList()//
> // * @since 2.2//
> // *///
> // default Stream<X> getResultStream() {//
> // return getResultList().stream();//
> // }/
>
> Do you think this is OK or does it give providers to big freedom?
>
> Thank you,
> --lukas
>
>