[jsr338-experts] Re: ChangeSet proposal [was Re: Re: Standardized Access to ChangeSet]

From: Gordon Yorke <gordon.yorke_at_oracle.com>
Date: Mon, 18 Jul 2011 11:17:45 -0300

Hello All

Linda DeMichiel wrote:
> Pinaki and Adam,
>
> I'd like to thank you again from drafting this proposal.
> Having something concrete to discuss really helps us focus.
>
> Some comments and questions below....
>
>> ------------------------------------------------------------------
>>
>> Proposed addition in Section 3.2.8
>>
>> Access to Managed Entities
>> --------------------------
>> An entity managed by in a persistence context transits through
>> different life cycle states.
>> For example, when
>> the entity is realized from a database it begins in a clean state and
>> updating any of its
>> persistent properties by the application would make the entity
>> transit to a dirty state.
>> An application may require to access the managed entities in a
>> persistence context by their
>> life cycle states. The following section describes these lifecycle
>> states and the API methods
>> to access managed entities in a particular state.
>>
>> Life Cycle States of a managed Entity
>> --------------------------------------
>>
>> The life cycle states of an entity is enumerated in
>> javax.persistence.LifeCycleState.
>>
>> package javax.persistence;
>> public enum LifeCycleState {
>> NEW, // newly persisted in the current transaction
>> CLEAN, // present in the current transaction but not modified
>> DIRTY, // present in the current transaction and modified
>> HOLLOW, // referred in the current transaction (see getReference()),
>> // but its persistent state may or may not be populated
>> REMOVED // marked for deletion
>> }
>>
>> Definition of each life cycle state
>> -----------------------------------
>> NEW represents the state of an entity which has been instantiated via
>> new operator of Java language
>> and added to the persistence context either directly via persist()
>> method or is indirectly reachable
>> via cascaded relation to another instance which has been merged via
>> merge() method.
>>
>> CLEAN represents the state of an entity which has been retrieved from
>> the database via query of find()
>> and has not been modified.
>>
>> DIRTY represents the state of an entity which has been retrieved from
>> the database and subsequently
>> modified. Modification implies that either any of it non-relational
>> property value has been changed
>> or any of relation has changed to a new reference. Modification does
>> not imply change in content of
>> a many-valued relation. For example, if a parent object P has a
>> many-valued relation to a collection
>> C of child objects, then adding a new child object in the collection
>> C does not dirty the parent P.
>> But replacing the collection C itself by another collection does.
>>
>> HOLLOW represents the state of an entity which can be referred in the
>> persistence context but
>> its persistent state is not populated. This state occurs when an
>> entity is obtained via getReference()
>> method that was not present in the current context.
>>
>
> I don't understand the use case for distinguishing the HOLLOW state.
> Is there any distinction between this and the result of isLoaded on
> a clean instance which would require this?
I agree.
>
>> REMOVED represents the state of an entity which has been removed
>> either via direct remove() operation
>> or indirectly via cascaded relation to another instance which has
>> been removed.
>>
>> It is important to note that the states are not strictly mutually
>> exclusive. An instance could be
>> newly persisted (i.e. NEW state) as well as marked for removal (i.e.
>> REMOVED stated) in the same context.
>> However, DIRTY and REMOVED are mutually exclusive.
There should be no reason why Dirty and Removed are mutually exclusive.
There are real world usecases where users want the pending updates to an
entity flushed and then have the row removed. This is usually found
when removes are actually only "logical" removes on the database.
>>
>
> I'm also not clear on the definition for the REMOVED state. I would
> expect
> that entities in this state would be those identified by the provider
> in terms of a pending (and/or completed) DELETE operation on the
> database.
>
>> Access by Life Cycle State
>> --------------------------
>> The managed instances are queried by their life cycle states via
>> EntityManager interface.
>> The resultant set contains the entities currently managed by the
>> persistence context
>> and satisfying the given conditions on their life cycle state and
>> state of synchronization
>> to the database. Because the state of a member entity can change
>> later in way that the basic
>> condition of its membership may not be valid anymore. For example, a
>> CLEAN entity may be
>> modified to change in DIRTY state. Hence the resultant set reflects
>> the persistence context
>> at the time of invocation.
>>
>> The access API also uses a tertiary condition to designate the state
>> of synchronization of
>> an entity with the database i.e. whether an entity has been flushed
>> or not to the database.
>> Flush operation does not change the life cycle state.
>>
>> package javax.persistence;
>> import java.util.Set;
>> public interface EntityManager {
>> /**
>> * Get the set of entities managed by this persistence context and
>> satisfies the given conditions.
>> *
>> * @param entityType the entity must be an instance of the given type
>> or any of its sub-type.
>> * null implies entity of any type.
>
> Is there a reason we need null to indicate entities of all types (and
> not Object.class)? Also, can the entityType argument be any supertype
> (and not just an *entity* type)?
I agree.
>
>> * @param includeFlushed flags if the resultant set based on life
>> cycle states is further filtered
>> * by flushed state of the entities.
>> * <tt>true</tt> implies the resultant set will include entities whose
>> state
>> * are currently synchronized to the database by flushing
>> * <tt>false</tt> implies the resultant set will include entities
>> whose state
>> * are not currently synchronized to the database by flushing
>> * null implies the resultant set will include both synchronized and
>> * unsynchronized entities
>
> I would prefer to see multiple methods or an enum instead of the
> three-valued includeFlushed Boolean.
>
I agree. It would be better if synchronized and unsynchronized changes
we returned separately. An Entity may be recently INSERTED in this
transaction but be removed in the Persistence Context. It would be
difficult for users to tell what had happened if the Entity was returned
for both the REMOVED and PERSISTED LifeCycles.
>> * @param states the states to be interrogated. If multiple states are
>> specified then each member
>> * of the resultant set will satisfy at least one of the life cycle
>> state condition,
>> * essentially the states are ORed.
>> * Note that a null value implies any life cycle state.
>> *
>> *
>> * @return an immutable set of managed entities where each member is
>> in at least one of the given life
>> * cycle states and an instance of the given entity type and in the
>> given flushed state.
>> * The set contains the members as per their life cycle state at the
>> point of invocation
>> * of this method. If any of the members change their life cycle state
>> later, this
>> * resultant set is <em>not</em> updated, i.e. this set is
>> <em>not</em> live.
>> * The members of the set refer the same entities managed by the
>> persistence context.
>> * Hence it is possible that that a member may not satisfy the
>> original condition
>> * of set membership at a later point in time.
>> */
>> <T> Set<T> getManagedEntities(Class<T> entityType, Boolean
>> includeFlushed, LifeCycleState... states);
>>
>
> This method feels a bit like a swiss army knife to me. If we could
> better
> identify the expected use cases, it might make sense to factor it into
> multiple methods to better reflect those. The invocation for what I
> would
> expect to be a common case seems rather unfortunate to me (null, null,
> LifeCycleState.XXX)
>
>> Example:
>> EntityManager em = ...;
>> Set<?> flushedDirtySet = getManagedEntities(null, true,
>> LifeCycleState.DIRTY);
>>
>> will return the set of entities that are dirty but had been flushed
>> to the database.
>>
>>
>> EntityManager em = ...;
>> Set<?> newCustomers = getManagedEntities(Customer.class, null,
>> LifeCycleState.NEW, LifeCycleState.CLEAN);
>>
>> will return the set of Customer entities that are either newly
>> persisted or fetched but not modified
>> in this transaction irrespective of whether they had been flushed to
>> the database.
>
> The JPA spec has historically avoided defining lifecycle states in
> order to provide better flexibility to implementations, so I am
> concerned that we may be spinning our wheels in trying to be more
> precise here. Would it meet the use cases you are trying to cover to
> define these in terms of the operations that the persistence provider
> issues to the database --i.e., INSERT, UPDATE, DELETE,...?
I agree but it would be better to define the changes based on the
operations of the EntityManager. REMOVED, PERSISTED, CHANGED. It would
be far more intuitive for users.

>
> thanks again,
>
> -Linda
>
>