Pinaki and Adam,
I'd like to thank you again from drafting this proposal.
Having something concrete to discuss really helps us focus.
Some comments and questions below....
> ------------------------------------------------------------------
>
> Proposed addition in Section 3.2.8
>
> Access to Managed Entities
> --------------------------
> An entity managed by in a persistence context transits through different life cycle states.
> For example, when
> the entity is realized from a database it begins in a clean state and updating any of its
> persistent properties by the application would make the entity transit to a dirty state.
> An application may require to access the managed entities in a persistence context by their
> life cycle states. The following section describes these lifecycle states and the API methods
> to access managed entities in a particular state.
>
> Life Cycle States of a managed Entity
> --------------------------------------
>
> The life cycle states of an entity is enumerated in javax.persistence.LifeCycleState.
>
> package javax.persistence;
> public enum LifeCycleState {
> NEW, // newly persisted in the current transaction
> CLEAN, // present in the current transaction but not modified
> DIRTY, // present in the current transaction and modified
> HOLLOW, // referred in the current transaction (see getReference()),
> // but its persistent state may or may not be populated
> REMOVED // marked for deletion
> }
>
> Definition of each life cycle state
> -----------------------------------
> NEW represents the state of an entity which has been instantiated via new operator of Java language
> and added to the persistence context either directly via persist() method or is indirectly reachable
> via cascaded relation to another instance which has been merged via merge() method.
>
> CLEAN represents the state of an entity which has been retrieved from the database via query of find()
> and has not been modified.
>
> DIRTY represents the state of an entity which has been retrieved from the database and subsequently
> modified. Modification implies that either any of it non-relational property value has been changed
> or any of relation has changed to a new reference. Modification does not imply change in content of
> a many-valued relation. For example, if a parent object P has a many-valued relation to a collection
> C of child objects, then adding a new child object in the collection C does not dirty the parent P.
> But replacing the collection C itself by another collection does.
>
> HOLLOW represents the state of an entity which can be referred in the persistence context but
> its persistent state is not populated. This state occurs when an entity is obtained via getReference()
> method that was not present in the current context.
>
I don't understand the use case for distinguishing the HOLLOW state.
Is there any distinction between this and the result of isLoaded on
a clean instance which would require this?
> REMOVED represents the state of an entity which has been removed either via direct remove() operation
> or indirectly via cascaded relation to another instance which has been removed.
>
> It is important to note that the states are not strictly mutually exclusive. An instance could be
> newly persisted (i.e. NEW state) as well as marked for removal (i.e. REMOVED stated) in the same context.
> However, DIRTY and REMOVED are mutually exclusive.
>
I'm also not clear on the definition for the REMOVED state. I would expect
that entities in this state would be those identified by the provider
in terms of a pending (and/or completed) DELETE operation on the database.
> Access by Life Cycle State
> --------------------------
> The managed instances are queried by their life cycle states via EntityManager interface.
> The resultant set contains the entities currently managed by the persistence context
> and satisfying the given conditions on their life cycle state and state of synchronization
> to the database. Because the state of a member entity can change later in way that the basic
> condition of its membership may not be valid anymore. For example, a CLEAN entity may be
> modified to change in DIRTY state. Hence the resultant set reflects the persistence context
> at the time of invocation.
>
> The access API also uses a tertiary condition to designate the state of synchronization of
> an entity with the database i.e. whether an entity has been flushed or not to the database.
> Flush operation does not change the life cycle state.
>
> package javax.persistence;
> import java.util.Set;
> public interface EntityManager {
> /**
> * Get the set of entities managed by this persistence context and satisfies the given conditions.
> *
> * @param entityType the entity must be an instance of the given type or any of its sub-type.
> * null implies entity of any type.
Is there a reason we need null to indicate entities of all types (and
not Object.class)? Also, can the entityType argument be any supertype
(and not just an *entity* type)?
> * @param includeFlushed flags if the resultant set based on life cycle states is further filtered
> * by flushed state of the entities.
> * <tt>true</tt> implies the resultant set will include entities whose state
> * are currently synchronized to the database by flushing
> * <tt>false</tt> implies the resultant set will include entities whose state
> * are not currently synchronized to the database by flushing
> * null implies the resultant set will include both synchronized and
> * unsynchronized entities
I would prefer to see multiple methods or an enum instead of the
three-valued includeFlushed Boolean.
> * @param states the states to be interrogated. If multiple states are specified then each member
> * of the resultant set will satisfy at least one of the life cycle state condition,
> * essentially the states are ORed.
> * Note that a null value implies any life cycle state.
> *
> *
> * @return an immutable set of managed entities where each member is in at least one of the given life
> * cycle states and an instance of the given entity type and in the given flushed state.
> * The set contains the members as per their life cycle state at the point of invocation
> * of this method. If any of the members change their life cycle state later, this
> * resultant set is <em>not</em> updated, i.e. this set is <em>not</em> live.
> * The members of the set refer the same entities managed by the persistence context.
> * Hence it is possible that that a member may not satisfy the original condition
> * of set membership at a later point in time.
> */
> <T> Set<T> getManagedEntities(Class<T> entityType, Boolean includeFlushed, LifeCycleState... states);
>
This method feels a bit like a swiss army knife to me. If we could better
identify the expected use cases, it might make sense to factor it into
multiple methods to better reflect those. The invocation for what I would
expect to be a common case seems rather unfortunate to me (null, null, LifeCycleState.XXX)
> Example:
> EntityManager em = ...;
> Set<?> flushedDirtySet = getManagedEntities(null, true, LifeCycleState.DIRTY);
>
> will return the set of entities that are dirty but had been flushed to the database.
>
>
> EntityManager em = ...;
> Set<?> newCustomers = getManagedEntities(Customer.class, null, LifeCycleState.NEW, LifeCycleState.CLEAN);
>
> will return the set of Customer entities that are either newly persisted or fetched but not modified
> in this transaction irrespective of whether they had been flushed to the database.
The JPA spec has historically avoided defining lifecycle states in
order to provide better flexibility to implementations, so I am
concerned that we may be spinning our wheels in trying to be more
precise here. Would it meet the use cases you are trying to cover to
define these in terms of the operations that the persistence provider
issues to the database --i.e., INSERT, UPDATE, DELETE,...?
thanks again,
-Linda