RAR7004 : MDB deployment is still happening. Cannot create end point now

From: Comerford, Sean <Sean.Comerford_at_espn.com>
Date: Tue, 13 Nov 2012 11:28:22 -0500

We're running into an annoying issue deploying our MDBs.

When we redeploy the ear, it sometimes takes a reallllllly long time – as in 20 minutes when normally it takes maybe a minute to fully redeploy.

When this delay crops up, the message being spammed to the logs is:

RAR7004 : MDB deployment is still happening. Cannot create end point now.

We're using Glassfish with the IBM Websphere MQ resource adapter and see the issue in both GF V2.1 and V3.1.2. IBM (perhaps not surprisingly) points the finger at Glassfish for this issue.

Has anyone else seen this? Any ideas what I should be looking at my ejb config that might be causing this?

Here's the analysis IBM provided us:

The trace shows that there are 7 endpoints that need to be activated and
the slowness starts to occur during the activation of the third
endpoint. The reason for the slowness is due to a) waiting for an error
from the glassfish JCA code and b) the use of a shared Hconn
(connection) between these endpoints.

The endpoints are activated serially on one thread by glassfish.

Below is a summary of what is occuring:

1) The first and second endpoints are activated and the connection
consumers are ready to receive messages to drive the MDBs.

2) First consumer/endpoint gets a message and obtains a ServerSession to
drive the MDB. It then calls createEndpoint on the glassfish JCA code,
this endpoint is the MDB that the onMessage will be called on.

3) The second endpoint also has a message but is now blocked waiting for
a serversession to become free. Only one serversession has been
configured for each connection so it waits for the thread at point 2 to
finish.

4) The third endpoint is called to activated but is blocked as point 3
currently has a connection lock due to the same Hconn being used.

5) The endpoint at point 2 called createEndpoint but nothing is returned
for 1 minute, until the following exception is thrown:

EndpointFactory is currently not available
[javax.resource.spi.UnavailableException] at:
17:17:39.581.0H 0006
com.sun.enterprise.connectors.inbound.ConnectorMessageBeanClient.createE
ndpoint(ConnectorMessageBeanClient.java:407)
17:17:39.581.0H 0006
com.sun.enterprise.connectors.inbound.ConnectorMessageBeanClient.createE
ndpoint(ConnectorMessageBeanClient.java:366)
17:17:39.581.0H 0006
com.ibm.mq.connector.inbound.WorkImpl.run(WorkImpl.java:226)
17:17:39.581.0H 0006
com.sun.enterprise.connectors.work.OneWork.doWork(OneWork.java:92)
17:17:39.581.0H 0006
com.sun.corba.ee.impl.orbutil.threadpool.ThreadPoolImpl$WorkerThread.per
formWork(ThreadPoolImpl.java:492)
17:17:39.581.0H 0006
com.sun.corba.ee.impl.orbutil.threadpool.ThreadPoolImpl$WorkerThread.run
(ThreadPoolImpl.java:528)

6) The exception causes the serversession to be freed which then allows
the connection lock to be freed and the activation of the third endpoint
can continue but was delayed by over a minute.

So the root cause of the delay is the exception from the glassfish
JCA code. For some reason the glassfish code waited for 1 minute before
throwing the above exception. Due to the configuration of the Resource
Adapter (connectionConcurrency) the endpoints will share the same Hconn,
so a delay in one endpoint has a knock on effect in the other. We would
recommend that the customer changes the WMQ JCA Resource Adapter
property called connectionConcurrency to 1, this will allow the
endpoints to have their own Hconn to the WMQ Queue Manager and allow the
ServerSessionPool to be used per endpoint rather than across endpoints.
It would also limit the impact of this
"javax.resource.spi.UnavailableException" exception that is causing the
delay and any other bottlenecking that could occur.

The customer should also contact glassfish support to find out the
reason for the UnavailableException thrown from the glassfish JCA code.

---
Sean Comerford
ESPN.com Architecture & Platforms