[jsr338-experts] Re: ChangeSet proposal [was Re: Re: Standardized Access to ChangeSet]

From: Emmanuel Bernard <emmanuel.bernard_at_jboss.com>
Date: Fri, 29 Jul 2011 14:31:01 +0200

Hi all,
Some comments inline.

Emmanuel

> Proposed addition in Section 3.2.8
>
> Access to Managed Entities
> --------------------------
> An entity managed by in a persistence context transits through different life cycle states.
> For example, when
> the entity is realized from a database it begins in a clean state and updating any of its
> persistent properties by the application would make the entity transit to a dirty state.
> An application may require to access the managed entities in a persistence context by their
> life cycle states. The following section describes these lifecycle states and the API methods
> to access managed entities in a particular state.
>
> Life Cycle States of a managed Entity
> --------------------------------------
>
> The life cycle states of an entity is enumerated in javax.persistence.LifeCycleState.
>
> package javax.persistence;
> public enum LifeCycleState {
> NEW, // newly persisted in the current transaction

Do we need to differentiate this state from DIRTY? In which occasion?
Regardless, like others, I think a better name would help reduce confusion.

> CLEAN, // present in the current transaction but not modified
> DIRTY, // present in the current transaction and modified
> HOLLOW, // referred in the current transaction (see getReference()),
> // but its persistent state may or may not be populated

I'd leave this as CLEAN, if one need to check an entit load state, we already have APIs.

> REMOVED // marked for deletion
> }
>
> Definition of each life cycle state
> -----------------------------------
> NEW represents the state of an entity which has been instantiated via new operator of Java language
> and added to the persistence context either directly via persist() method or is indirectly reachable
> via cascaded relation to another instance which has been merged via merge() method.

an instance indirectly reached by a persist() call would do too.

>
> CLEAN represents the state of an entity which has been retrieved from the database via query of find()
> and has not been modified.
>
> DIRTY represents the state of an entity which has been retrieved from the database and subsequently
> modified. Modification implies that either any of it non-relational property value has been changed
> or any of relation has changed to a new reference.

I'd rather say that the object is managed and its state is not synchronized with the database state.

> Modification does not imply change in content of
> a many-valued relation. For example, if a parent object P has a many-valued relation to a collection
> C of child objects, then adding a new child object in the collection C does not dirty the parent P.
> But replacing the collection C itself by another collection does.

Can't we use the same rules as 'mappedBy' ie state change based on the non owning side is not seen by the entity. Only cascading rules apply.

>
> HOLLOW represents the state of an entity which can be referred in the persistence context but
> its persistent state is not populated. This state occurs when an entity is obtained via getReference()
> method that was not present in the current context.
>
> REMOVED represents the state of an entity which has been removed either via direct remove() operation
> or indirectly via cascaded relation to another instance which has been removed.
>
> It is important to note that the states are not strictly mutually exclusive. An instance could be
> newly persisted (i.e. NEW state) as well as marked for removal (i.e. REMOVED stated) in the same context.
> However, DIRTY and REMOVED are mutually exclusive.

I haven't thought too much about the problem but is there some kind of natural order in the states. Like REMOVED >> NEW. That would allow us to keep one tate per entity at a given time.

>
> Access by Life Cycle State
> --------------------------
> The managed instances are queried by their life cycle states via EntityManager interface.
> The resultant set contains the entities currently managed by the persistence context
> and satisfying the given conditions on their life cycle state and state of synchronization
> to the database. Because the state of a member entity can change later in way that the basic
> condition of its membership may not be valid anymore. For example, a CLEAN entity may be
> modified to change in DIRTY state. Hence the resultant set reflects the persistence context
> at the time of invocation.

True but the entity state might not.
I can receive an entity marked as clean, then subsequently modify the entity state. The Set will contain a pointer to the entity with it's new data, not the data at the time of invocation.

>
> The access API also uses a tertiary condition to designate the state of synchronization of
> an entity with the database i.e. whether an entity has been flushed or not to the database.
> Flush operation does not change the life cycle state.

hum, do we really want that?
This forces is to keep more state than usually necessary. Also what happens after a clear()? The proposal should mention it.

>
> package javax.persistence;
> import java.util.Set;
> public interface EntityManager {
> /**
> * Get the set of entities managed by this persistence context and satisfies the given conditions.
> *
> * @param entityType the entity must be an instance of the given type or any of its sub-type.
> * null implies entity of any type.

I'd pass Object.class but that's just me.

> * @param includeFlushed flags if the resultant set based on life cycle states is further filtered
> * by flushed state of the entities.
> * <tt>true</tt> implies the resultant set will include entities whose state
> * are currently synchronized to the database by flushing
> * <tt>false</tt> implies the resultant set will include entities whose state
> * are not currently synchronized to the database by flushing
> * null implies the resultant set will include both synchronized and
> * unsynchronized entities

I find the flush flag subtly complex to understand especially for non JPA experts.
Also, I might need to:
* get the closure of all managed entities
* know which are flushed and which are not
=> that requires to call this operation twice and for some implementation, this operation will be somewhat expensive.

> * @param states the states to be interrogated. If multiple states are specified then each member
> * of the resultant set will satisfy at least one of the life cycle state condition,
> * essentially the states are ORed.
> * Note that a null value implies any life cycle state.
> *

LifecyleState.ALL instead?

> *
> * @return an immutable set of managed entities where each member is in at least one of the given life
> * cycle states and an instance of the given entity type and in the given flushed state.
> * The set contains the members as per their life cycle state at the point of invocation
> * of this method. If any of the members change their life cycle state later, this
> * resultant set is <em>not</em> updated, i.e. this set is <em>not</em> live.
> * The members of the set refer the same entities managed by the persistence context.
> * Hence it is possible that that a member may not satisfy the original condition
> * of set membership at a later point in time.
> */
> <T> Set<T> getManagedEntities(Class<T> entityType, Boolean includeFlushed, LifeCycleState... states);
>
> Example:
> EntityManager em = ...;
> Set<?> flushedDirtySet = getManagedEntities(null, true, LifeCycleState.DIRTY);

This API is not readable, I'd favor Enums over booleans and null use.

>
> will return the set of entities that are dirty but had been flushed to the database.
>
>
> EntityManager em = ...;
> Set<?> newCustomers = getManagedEntities(Customer.class, null, LifeCycleState.NEW, LifeCycleState.CLEAN);
>
> will return the set of Customer entities that are either newly persisted or fetched but not modified
> in this transaction irrespective of whether they had been flushed to the database.
>
>
>