users@glassfish.java.net

Net XA Derby Client blocking the whole glassfish domain, cpu at 100%

From: Paul <paul_at_nosphere.org>
Date: Thu, 6 Nov 2008 16:58:07 +0100

Hello,

We have an issue with Glassfish hangging and eating 100% cpu.

We used to use v2-b58g, upgrading to v2ur2-b04 did not solve the issue.

Following advices given on #glassfish I reproduced the problem and took
several thread dumps to see if the BLOCKED thread was doing some work but all
my thread dumbs show the same following state :

[code]
Thread "httpSSLWorkerThread-443-4" thread-id 3 361 thread-stateBLOCKEDWaiting
on lock: java.util.Vector_at_ee76ea
         Owned by: httpSSLWorkerThread-443-3 Id: 3 360 at:
org.apache.derby.client.net.NetXAResource.removeXaresFromSameRMchain(Unknown
Source)
         at: org.apache.derby.client.net.NetConnection.closeForReuse(Unknown Source)
         at: org.apache.derby.client.am.LogicalConnection.close(Unknown Source)
         at:
com.sun.gjc.spi.ManagedConnection.transactionCompleted(ManagedConnection.java:507)
         at: com.sun.gjc.spi.XAResourceImpl.commit(XAResourceImpl.java:88)
         at:
com.sun.jts.jtsxa.OTSResourceImpl.commit_one_phase(OTSResourceImpl.java:166)
         at:
com.sun.jts.CosTransactions.RegisteredResources.commitOnePhase(RegisteredResources.java:1575)
         at:
com.sun.jts.CosTransactions.TopCoordinator.commitOnePhase(TopCoordinator.java:2949)
         at:
com.sun.jts.CosTransactions.CoordinatorTerm.commit(CoordinatorTerm.java:317)
         at:
com.sun.jts.CosTransactions.TerminatorImpl.commit(TerminatorImpl.java:249)
         at: com.sun.jts.CosTransactions.CurrentImpl.commit(CurrentImpl.java:623)
         at:
com.sun.jts.jta.TransactionManagerImpl.commit(TransactionManagerImpl.java:309)
         at:
com.sun.enterprise.distributedtx.J2EETransactionManagerImpl.commit(J2EETransactionManagerImpl.java:1030)
         at:
com.sun.enterprise.distributedtx.J2EETransactionManagerOpt.commit(J2EETransactionManagerOpt.java:397)
         at:
com.sun.ejb.containers.BaseContainer.completeNewTx(BaseContainer.java:3792)
         at:
com.sun.ejb.containers.BaseContainer.postInvokeTx(BaseContainer.java:3571)
         at:
com.sun.ejb.containers.WebServiceInvocationHandler.invoke(WebServiceInvocationHandler.java:200)
         at: $Proxy214.findUsersByGroupUUID(Unknown Source)
[/code]

This happens quite often and asadmin stop-domain does not work, I have to
kill -9 (hu!) the domain instance.

I can provide the thread dumps but building a sample project to reproduce will
be difficult as the live project is huge.

I couldn't find any related issue in the GF tracker, maybe I just did not see
the right ones.

Any help appreciated.

Best regards.

Paul