dev@glassfish.java.net

[gf-dev] Problem with transaction recovery between glassfish and other app servers

From: Michael Musgrove <mmusgrov_at_redhat.com>
Date: Fri, 8 Apr 2016 17:56:13 +0100

I have hit an issue that stops glassfish from recovering in doubt
transactions. I traced the problem to what looks like a bug in
com.sun.jts.CosTransactions.GlobalTID.hashCode().

I have a test does the following:

1. start a (JTS) transaction on WildFly and invoke an EJB deployed to
GlassFish Server Open Source Edition 4.1.1 (build 1);
2. the EJB on glassfish enlists an XAResource which calls
Runtime.getRuntime().halt(1) during commit (in order to generate a recovery
record);
3. when glassfish is restarted I can see the
com.sun.jts.CosTransactions.RecoveryManager.proceedWithXARecovery() getting
the Xid from my resource (via XAResource.recover()) but the Xid never
matches up with any of the coordinators known to the recovery manager
(namely the Hashtable
com.sun.jts.CosTransactions.RecoveryManager.coordsByGlobalTID)

The reason why the Xid returned by the XAResource recover method does not
match any of the known coordinators is that the implementation of
GlobalTID.hashCode() incorrectly includes the bqual component of the otid_t
in the hash (hashCode += realTID.formatID + realTID.bqual_length;). The
equals method correctly ignores bqual_length but since java.util.Hashtable
only uses the hashCode() method to lookup elements the hashes never match.

Note that the two GlobalTIDs used here can have a different bqual_length
because:

- the one that goes into the RecoveryManager.coordsByGlobalTID Hashtable
can come from a propagated transaction context
via TransactionFactoryImpl.recreate(PropagationContext context). This route
constructs the tid using new GlobalTID(context.current.otid) and in the
WildFly case the otid contains a non zero value for the bqual_length (which
is required by the OMG OTS specification);
- the one that comes from calling XAResource.recover is obtained by calling
OTSResourceImpl.getGlobalTID() which is built from
xid.getGlobalTransactionId() only so the bqual_length is always zero

Should I raise a JIRA for this issue?

Regards,
Mike


-- 
Michael Musgrove
Transactions Team
e: mmusgrov_at_redhat.com
t: +44 191 243 0870
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham (US), Paul Hickey (Ireland), Matt Parson
(US), Charles Peters (US)
Michael Cunningham (US), Charles Peters (US), Matt Parson (US), Michael
O'Neill(Ireland)