users@glassfish.java.net

Re: Vague CORBA issue causing unexplainable problems

From: Amy Kang <amy.kang_at_oracle.com>
Date: Fri, 14 Jan 2011 12:01:34 -0800

There has been an internal discussion regarding caching JMS connection
in Stateless EJB instance, please see
http://java.net/jira/browse/GLASSFISH-15579

amy

On 11-01-04 07:43 PM, Amy Kang wrote:
> On 11-01-03 01:49 PM, Paul Giblock wrote:
>> Amy -
>>
>> I am now opening JMS connections on demand using a method-scoped
>> Connection. This seems to 'fix' the problem. However, I know it is
>> 'fixed' simply because there are not as many JMS connections.
>
> not necessarily, for it could be due to the elimination of the cached
> connection in your Stateless Session Bean instance. For EJB, JMSRA
> (the resource adapter for GlassFish MQ) associates a transaction at
> JMS connection level, therefore if the connection is cached in
> Stateless Session Bean instance, and if a Stateless Session Bean
> instance or the cached connection is possibly (either due to a bug or
> due to application component that allows that) concurrently used in
> multiple transactions, the problem can occur.
>
>> I am
>> afraid that we may encounter some usage pattern in the future which
>> may cause many JMS connections to be opened simultaneously - causing
>> the error again sometime when we don't expect it.
>>
>> Regardless, this solution is obviously more efficient as it takes
>> advantage of GF's connection pooling. Although, it surprises me that
>> Sun/Oracle approves of the "Cache your connection at @PostConstruct"
>> paradigm if it causes errors. Is there a way I can keep track of "#
>> opened JMS connections / max pool size" so I can be alerted as we
>> start to approach or maximum?
>
> Please check GlassFish Administration Guide or in GlassFish
> Administration Console on Monitoring. On MQ broker side, you can
> see the number of open client connections using 'imqcmd list cxn'
>
> amy
>
>> I haven't tested on the 3.1 snapshot yet, but I have tested on 3.0.1
>> and the same errors occur as on our production servers (CORBA-ish
>> exception in EMBEDDED or LOCAL mode).
>>
>> Anyways, Thank you for the help so far, I will continue inspecting this.
>>
>> -Paul G
>>
>> On Thu, Dec 30, 2010 at 3:28 PM, Amy Kang<amy.kang_at_oracle.com> wrote:
>>> and
>>>
>>> On 12/30/2010 12:15 PM, Amy Kang wrote:
>>>> Paul,
>>>>
>>>> So far my comments on this has been focusing on JMS side with the
>>>> assumption that everything else in the GlassFish server that you
>>>> were using
>>>> works as expected and with piece by piece info on your
>>>> application. For
>>>> example, some other factors to consider
>>>>
>>>> . Concurrent use of a Stateless session bean instance
>>>> - You can move the make/closeConnection to the announceNewQuestion()
>>>> method, since the JMS connections are pooled, to see if you still
>>>> see the
>>>> problem.
>>> when you try the above, change the instance variable 'connection'
>>> to method
>>> local. -amy
>>>
>>>> . Any potential bugs in GlassFish 3 that has been fixed in 3.0.1
>>>> and 3.1
>>>> that could be related to this (?)
>>>> - If you can not try 3.1 latest promoted build, you should at
>>>> least try
>>>> 3.0.1 to see if you can reproduce the same problem
>>>> . If necessary, a non-public (engineer's) property that you maybe
>>>> able to
>>>> try to rule out of 1 area (when the colleague who works in this
>>>> area returns
>>>> from vacation next week)
>>>> . Any other exceptions seen in the server log (?)
>>>>
>>>> You can also enable FINE debug logging to relevant components of the
>>>> GlassFish server, ejb, jts/jta, jca, jms, corba, .., which you can
>>>> set in
>>>> GlassFish Administration Console, and some additional JMSRA logger
>>>> names (as
>>>> seen in the source code)
>>>> com.sun.messaging.jmq.jmsclient.XAResourceForMC
>>>> javax.resourceadapter.mqjmsra.outbound.connection
>>>> javax.resourceadapter.mqjmsra.xa
>>>> com.sun.messaging.jms.ra.DirectXAResource
>>>> javax.resourceadapter.mqjmsra
>>>> com.sun.messaging.jms.ra.ResourceAdapter
>>>>
>>>> and enabling MQ broker side transaction protocol debugging can also be
>>>> helpful, by setting following broker properties
>>>>
>>>> imq.debug.com.sun.messaging.jmq.jmsserver.data.handlers.TransactionHandler=true
>>>>
>>>> imq.debug.com.sun.messaging.jmq.jmsserver.data.protocol.ProtocolImpl=true
>>>>
>>>>
>>>> or by running
>>>> imqcmd debug class -n
>>>> com.sun.messaging.jmq.jmsserver.data.handlers.TransactionHandler
>>>> -debug
>>>> imqcmd debug class -n
>>>> com.sun.messaging.jmq.jmsserver.data.protocol.ProtocolImpl -debug
>>>>
>>>> When you file a JIRA issue (if not sure the component, file to
>>>> 'other'),
>>>> please attach the complete server/broker logs.
>>>>
>>>> amy
>>>>
>>>> On 12/29/2010 07:06 AM, Paul Giblock wrote:
>>>>> Amy,
>>>>>
>>>>>> @PostConstruct is to create the JMS connection, what does
>>>>>> @PreConstruct
>>>>>> and QuestionManagerBean.announceQuestion() do (not shown in your
>>>>>> code
>>>>>> snippet below) ? or did you actually mean @PreDestroy which is to
>>>>>> close the
>>>>>> JMS connection
>>>>> Right, I meant @PreDestory. There are, in fact, no other annotated
>>>>> lifecycle methods on this class.
>>>>>
>>>>>> and QuestionManagerBean.announceNewQuestion which is to send a JMS
>>>>>> message ?
>>>>>>
>>>>> Yes, the announceNewQuestion method is the only one to send a JMS
>>>>> message among both the QuestionManagerBean and WidgetHelperBean
>>>>> classes. This method, as well as the methods in JmsUtils and
>>>>> VHMStringUtils are complete and unadulterated.
>>>>>
>>>>>> If the later, the problem looks like a JMS related issue if without
>>>>>> calling these methods the problem does not occur. It's
>>>>>> possible the
>>>>>> problem is triggered by create/closeConnection, which could
>>>>>> indicates a JMS
>>>>>> related bug (GlassFish+JMSRA) in the area of "recycle" JMS
>>>>>> connection. You
>>>>>> can try to set the bean pool configuration of QuestionManagerBean
>>>>>> to avoid
>>>>>> bean destory, e.g. no idle timeout, max pool size large enough
>>>>>> for your
>>>>>> possible highest load and do the similar for the JMS connector pool.
>>>>>> However, it's necessary to find out the root cause the problem
>>>>>> in order to
>>>>>> give you the right advise to avoid (if possible) the problem, and
>>>>>> most
>>>>>> importantly to ensure the problem is fixed in a later release of
>>>>>> GlassFish.
>>>>>>
>>>>> Your explanation seems to match what I am observing. I agree, we
>>>>> don't
>>>>> want this to be a long-term standing bug in GF for everyone's sake.
>>>>> The workaround you mention is not optimal, and I am not confident I
>>>>> would know what our upper bound for load/traffic is.
>>>>>
>>>>>> How is QuestionManagerBean.announceNewQuestion invoked ?
>>>>>>
>>>>> From a servlet, which does:
>>>>>
>>>>> Context ctx = new InitialContext();
>>>>> QuestionManager questionMgr =
>>>>> (QuestionManager)
>>>>> ctx.lookup("java:comp/env/ejb/QuestionManager");
>>>>>
>>>>> // Load values from servlet request
>>>>>
>>>>> try {
>>>>> AskTuple at = questionMgr.ask(e,i,t,m,a);
>>>>> // Prepare response ...
>>>>> }
>>>>> catch (/* All our application exceptions */) ...
>>>>>
>>>>> The QuestionManager.ask method is simple:
>>>>>
>>>>> @Override
>>>>> public AskTuple ask (long eventId, final String ip, String
>>>>> userAlias,
>>>>> String html, MediaTuple media)
>>>>> throws EventAskingClosedException, EventExpiredException,
>>>>> EventNotStartedException, OffensiveException,
>>>>> EventMediaNotAllowedException,
>>>>> EventMediaRequiredException,
>>>>> StringMaxLengthException {
>>>>> // Basic input validation
>>>>> // Several JPA loads
>>>>> // Prepare new JPA Entity
>>>>> // A JPA persist
>>>>>
>>>>> if (needToAnnounce) {
>>>>> // we are calling this method with a JPA entity as a param
>>>>> announceNewQuestion(tuple.getQuestion());
>>>>> }
>>>>> return tuple;
>>>>> }
>>>>>
>>>>>
>>>>>> Could you please file a JIRA issue for this with as much
>>>>>> information as
>>>>>> possible in order to reproduce it (preferrably with a
>>>>>> reproducible test
>>>>>> case, and be sure to include GlassFish version/build #) ?
>>>>>>
>>>>> I will try. Any hints on how best to file it (category, imporant
>>>>> keywords, etc)? The big issue for me is, I cannot recreate this
>>>>> error
>>>>> on my own. I've only experienced it on both of our production
>>>>> systems, apparently due to the higher traffic loads. I'll have to
>>>>> figure out some way to generate enough traffic to cause the
>>>>> problem in
>>>>> a vacuum. Any ideas on how to make the problem appear sooner or with
>>>>> less traffic? Possibly lowering the number max-pool of
>>>>> ConnectionFactory to some very low (how low?) value..
>>>>>
>>>>> Thank you for your continued support,
>>>>> Paul G
>>>
>