Re: Vague CORBA issue causing unexplainable problems

From: Jagadish Prasath Ramu <Jagadish.Ramu_at_Sun.COM>
Date: Sun, 09 Jan 2011 09:43:35 +0530

Paul,

On Mon, 2011-01-03 at 16:49 -0500, Paul Giblock wrote:
> Amy -
>
> I am now opening JMS connections on demand using a method-scoped
> Connection. This seems to 'fix' the problem. However, I know it is
> 'fixed' simply because there are not as many JMS connections. I am
> afraid that we may encounter some usage pattern in the future which
> may cause many JMS connections to be opened simultaneously - causing
> the error again sometime when we don't expect it.
>
> Regardless, this solution is obviously more efficient as it takes
> advantage of GF's connection pooling.
> Although, it surprises me that
> Sun/Oracle approves of the "Cache your connection at @PostConstruct"
> paradigm if it causes errors.
The latest Connector specification (1.6) recommends "acquire on-demand
and release immediately once the work is completed" pattern for
connections acquired from the container (eg: connection-pool of the
server)
Section 6.4.3 in the Connector 1.6 specification "Application
Programming Model > Guidelines" explains the same.
> Is there a way I can keep track of "#
> opened JMS connections / max pool size" so I can be alerted as we
> start to approach or maximum?
Yes, you can enable connector-connection-pool monitoring in GlassFish to
see the connection usage.
http://docs.sun.com/app/docs/doc/820-7692/gipzv?l=en&a=view

http://blogs.sun.com/JagadishPrasath/entry/monitoring_jdbc_connection_pool_glassfish
Though the above blog is for jdbc-connection-pool, it will be similar
for connector-connection-pool. You can follow the instructions related
to GUI/CLI (instead of jdbc-connection-pool, use
connector-connection-pool).

Thanks,
-Jagadish

>
> I haven't tested on the 3.1 snapshot yet, but I have tested on 3.0.1
> and the same errors occur as on our production servers (CORBA-ish
> exception in EMBEDDED or LOCAL mode).
>
> Anyways, Thank you for the help so far, I will continue inspecting this.
>
> -Paul G
>
> On Thu, Dec 30, 2010 at 3:28 PM, Amy Kang <amy.kang_at_oracle.com> wrote:
> > and
> >
> > On 12/30/2010 12:15 PM, Amy Kang wrote:
> >>
> >> Paul,
> >>
> >> So far my comments on this has been focusing on JMS side with the
> >> assumption that everything else in the GlassFish server that you were using
> >> works as expected and with piece by piece info on your application. For
> >> example, some other factors to consider
> >>
> >> . Concurrent use of a Stateless session bean instance
> >> - You can move the make/closeConnection to the announceNewQuestion()
> >> method, since the JMS connections are pooled, to see if you still see the
> >> problem.
> >
> > when you try the above, change the instance variable 'connection' to method
> > local. -amy
> >
> >> . Any potential bugs in GlassFish 3 that has been fixed in 3.0.1 and 3.1
> >> that could be related to this (?)
> >> - If you can not try 3.1 latest promoted build, you should at least try
> >> 3.0.1 to see if you can reproduce the same problem
> >> . If necessary, a non-public (engineer's) property that you maybe able to
> >> try to rule out of 1 area (when the colleague who works in this area returns
> >> from vacation next week)
> >> . Any other exceptions seen in the server log (?)
> >>
> >> You can also enable FINE debug logging to relevant components of the
> >> GlassFish server, ejb, jts/jta, jca, jms, corba, .., which you can set in
> >> GlassFish Administration Console, and some additional JMSRA logger names (as
> >> seen in the source code)
> >> com.sun.messaging.jmq.jmsclient.XAResourceForMC
> >> javax.resourceadapter.mqjmsra.outbound.connection
> >> javax.resourceadapter.mqjmsra.xa
> >> com.sun.messaging.jms.ra.DirectXAResource
> >> javax.resourceadapter.mqjmsra
> >> com.sun.messaging.jms.ra.ResourceAdapter
> >>
> >> and enabling MQ broker side transaction protocol debugging can also be
> >> helpful, by setting following broker properties
> >>
> >> imq.debug.com.sun.messaging.jmq.jmsserver.data.handlers.TransactionHandler=true
> >> imq.debug.com.sun.messaging.jmq.jmsserver.data.protocol.ProtocolImpl=true
> >>
> >> or by running
> >> imqcmd debug class -n
> >> com.sun.messaging.jmq.jmsserver.data.handlers.TransactionHandler -debug
> >> imqcmd debug class -n
> >> com.sun.messaging.jmq.jmsserver.data.protocol.ProtocolImpl -debug
> >>
> >> When you file a JIRA issue (if not sure the component, file to 'other'),
> >> please attach the complete server/broker logs.
> >>
> >> amy
> >>
> >> On 12/29/2010 07:06 AM, Paul Giblock wrote:
> >>>
> >>> Amy,
> >>>
> >>>> @PostConstruct is to create the JMS connection, what does @PreConstruct
> >>>> and QuestionManagerBean.announceQuestion() do (not shown in your code
> >>>> snippet below) ? or did you actually mean @PreDestroy which is to close the
> >>>> JMS connection
> >>>
> >>> Right, I meant @PreDestory. There are, in fact, no other annotated
> >>> lifecycle methods on this class.
> >>>
> >>>> and QuestionManagerBean.announceNewQuestion which is to send a JMS
> >>>> message ?
> >>>>
> >>> Yes, the announceNewQuestion method is the only one to send a JMS
> >>> message among both the QuestionManagerBean and WidgetHelperBean
> >>> classes. This method, as well as the methods in JmsUtils and
> >>> VHMStringUtils are complete and unadulterated.
> >>>
> >>>> If the later, the problem looks like a JMS related issue if without
> >>>> calling these methods the problem does not occur. It's possible the
> >>>> problem is triggered by create/closeConnection, which could indicates a JMS
> >>>> related bug (GlassFish+JMSRA) in the area of "recycle" JMS connection. You
> >>>> can try to set the bean pool configuration of QuestionManagerBean to avoid
> >>>> bean destory, e.g. no idle timeout, max pool size large enough for your
> >>>> possible highest load and do the similar for the JMS connector pool.
> >>>> However, it's necessary to find out the root cause the problem in order to
> >>>> give you the right advise to avoid (if possible) the problem, and most
> >>>> importantly to ensure the problem is fixed in a later release of GlassFish.
> >>>>
> >>> Your explanation seems to match what I am observing. I agree, we don't
> >>> want this to be a long-term standing bug in GF for everyone's sake.
> >>> The workaround you mention is not optimal, and I am not confident I
> >>> would know what our upper bound for load/traffic is.
> >>>
> >>>> How is QuestionManagerBean.announceNewQuestion invoked ?
> >>>>
> >>> From a servlet, which does:
> >>>
> >>> Context ctx = new InitialContext();
> >>> QuestionManager questionMgr =
> >>> (QuestionManager) ctx.lookup("java:comp/env/ejb/QuestionManager");
> >>>
> >>> // Load values from servlet request
> >>>
> >>> try {
> >>> AskTuple at = questionMgr.ask(e,i,t,m,a);
> >>> // Prepare response ...
> >>> }
> >>> catch (/* All our application exceptions */) ...
> >>>
> >>> The QuestionManager.ask method is simple:
> >>>
> >>> @Override
> >>> public AskTuple ask (long eventId, final String ip, String userAlias,
> >>> String html, MediaTuple media)
> >>> throws EventAskingClosedException, EventExpiredException,
> >>> EventNotStartedException, OffensiveException,
> >>> EventMediaNotAllowedException, EventMediaRequiredException,
> >>> StringMaxLengthException {
> >>> // Basic input validation
> >>> // Several JPA loads
> >>> // Prepare new JPA Entity
> >>> // A JPA persist
> >>>
> >>> if (needToAnnounce) {
> >>> // we are calling this method with a JPA entity as a param
> >>> announceNewQuestion(tuple.getQuestion());
> >>> }
> >>> return tuple;
> >>> }
> >>>
> >>>
> >>>> Could you please file a JIRA issue for this with as much information as
> >>>> possible in order to reproduce it (preferrably with a reproducible test
> >>>> case, and be sure to include GlassFish version/build #) ?
> >>>>
> >>> I will try. Any hints on how best to file it (category, imporant
> >>> keywords, etc)? The big issue for me is, I cannot recreate this error
> >>> on my own. I've only experienced it on both of our production
> >>> systems, apparently due to the higher traffic loads. I'll have to
> >>> figure out some way to generate enough traffic to cause the problem in
> >>> a vacuum. Any ideas on how to make the problem appear sooner or with
> >>> less traffic? Possibly lowering the number max-pool of
> >>> ConnectionFactory to some very low (how low?) value..
> >>>
> >>> Thank you for your continued support,
> >>> Paul G
> >>
> >
> >