users@glassfish.java.net

Re: CompositeEnumeration has a NPE in it

From: <glassfish_at_javadesktop.org>
Date: Tue, 24 Aug 2010 17:43:01 PDT

Thanks.

This turns out to be some sort of a race condition inside Glassfish, specifically with the WorkManager supplied to resource adapters at startup time.

My resource adapter follows the specification and attempts (as instructed) to offload heavyweight tasks onto another thread by using the WorkManager. It stores the results of those tasks in a Future.

The Future is shared by the user connection factory (in this case an implementation of org.drools.KnowledgeBase: https://hudson.jboss.org/hudson/job/drools/lastSuccessfulBuild/artifact/trunk/target/javadocs/stable/drools-api/org/drools/KnowledgeBase.html). The user connection factory is a simple pass-through wrapper that calls through to the Future via its get method, retrieves the results (in this case the "real" KnowledgeBase implementation) and returns them.

The get() method is reporting that earlier, during WorkManager execution, a NullPointerException deep in the bowels of some classloader's getResources() method bubbled up, and so when you call Future#get(), it (at this point) throws the (already created) NullPointerException.

I have run this through btrace, which introduces just enough lag into everything that the bug goes away. :-(

Another interesting data point: my .ear file, which houses the resource adapter, and so which requires a subsequent set of asadmin commands to deploy, is usually deployed by a script that starts the domain and then subsequently immediately deploys the ear file, like this:

[code]
asadmin start-domain
asadmin deploy /path/to/my/.ear/file
asadmin create-connector-connection-pool ...
asadmin create-connector-resource ...
[/code]

If I deploy like this, then I encounter the bug. If I then do this:

[code]
asadmin undeploy --cascade my-app-name
asadmin deploy /path/to/my/.ear/file
asadmin create-connector-connection-pool ...
asadmin create-connector-resource ...
[/code]

...everything works OK.

I've tried introducing delays between the start-domain call and the deploy call, and between the deploy call and the create-connector-connection-pool call to no effect. The only thing that seems to make the bug go away is a second deploy.

All of this suggests to me that the WorkManager lets my Callable fly before all the .ear appropriate classloaders are properly initialized. Something about the first deploy--even though it results in an unusable app, and even though the app is then undeployed--something about that first deploy seems to fully initialize everything and then everything works properly.

(On a humorous note, this is actually the first time I've hit something where it works just fine in JBoss and fails in Glassfish. :-) )

I hate it when the tools I use to look at the problem make the problem go away....

Best,
Laird
[Message sent by forum member 'ljnelson']

http://forums.java.net/jive/thread.jspa?messageID=480924