persistence@glassfish.java.net

RE: ObjectBuilder.java

From: Gordon Yorke <gordon.yorke_at_oracle.com>
Date: Tue, 14 Mar 2006 11:35:10 -0500

Hello Scott,
        That's great. I am surprised that the .size() call here is the most pronounced bottle neck but its good to note. The mappings list is immutable post login (when this code is called) so the size should probably be referenced via a variable.
        As this code (and the related public API's) predates JDK 1.2 a Vector is used. We could migrate to more efficient data structures but that work would be quite involved and be high risk.
        The EJB 3.0 specification requires 'discovery' of new objects on which 'persist' was not called. That is what this code is doing. There is no 2-dimensional array; there is an object tree that must be recursively iterated over. This code can be optimized out by having the application read the 'read-only' data outside of a transaction on a transactional EM or clearing an Extended EM before the completion of the transaction. Otherwise special, non-spec, API's would be required to notify the EM that the transaction was read-only.

        If by 'data structure' you are talking about the collection of mappings, it is simply a collection of the artifacts that have behaviour and information for accessing and interpreting Entity property values. It is a collection of sub-classes of the TopLink class oracle.toplink.essentials.mappings.DatabaseMapping. It has the same usage in the contexts mentioned.

--Gordon

-----Original Message-----
From: Scott.Oaks_at_Sun.COM [mailto:Scott.Oaks_at_Sun.COM]On Behalf Of Scott
Oaks
Sent: Monday, March 13, 2006 6:48 PM
To: persistence_at_glassfish.dev.java.net
Subject: ObjectBuilder.java


I've been looking at some profiles of the performance test, and one of
the bottlenecks occurs because of this code in ObjectBuilder.java:

 public void iterate(DescriptorIterator iterator) {
        // PERF: Avoid synchronized enumerator as is concurrency bottleneck.
        Vector mappings = getDescriptor().getMappings();
        for (int index = 0; index < mappings.size(); index++) {
            ((DatabaseMapping)mappings.get(index)).iterate(iterator);
        }
    }

I'm not sure of the intent behind this code, but it is either
sub-optimal or semantically incorrect -- depending on the answers to
these questions:

1) Is the Database mapping vector immutable? If so, then at the very
least the mappings.size() can be pulled out of the loop -- but really, a
simple array would be a much more efficient data structure. If not, then
the next questions come into play:

2) Are items removed from the mapping (one thread removes while another
thread is iterating)? If so, this method will sometimes throw an
ArrayIndexOutOfBoundsException. [Which conceivably could happen also if
an item is added given alternate implementations of the Vector class,
and of course we can't depend on the implementation of that class.]
Additionally, it may skip items (which may or may not be a semantic
error).

3) Are items added in the middle of the mapping? If so, the iterator
will process some items twice (which may or may not be a semantic error
but probably isn't good for performance). [If items are appended to the
mapping, they may be skipped by this iterator -- again, possibly a
semantic error but not necessarily.]

Most interesting to me is that we got to this code because when the
current transaction ended, the toplink code when through
collectAndPrepareForCommit() -> assignSequenceNumbers() ->
discoverAllUnregisteredNewObjects(). But as this is a read-only test,
searching a two-dimensional array for something that won't be there is
something that could be optimized. Though that's doubtless a long-term
thing to look at.

In the meanwhile, can someone explain to me the usage of the data
structure in this context (and, in fact, similar code in
DefferredChangeDetectonPolicy.createObjectChangeSetThroughComparison(),
DatabaseAccessor.fetchRow(), ObjectBuilder.buildAttributesIntoObject(),
and AbtractRecord.getIndicatingNoEntry() -- call from which to
vector.size() account for 6% of the time in the profile, primarily from
synchronization contention).

-Scott