users@glassfish.java.net

DAS and node instance disappear under moderate load

From: <glassfish_at_javadesktop.org>
Date: Tue, 26 May 2009 03:08:44 PDT

Hello,

I have a problem with a clustered domain that serves WebService calls.
Under what seems to be a moderate load (10 WS requests/s, CPU usage < 10%, response time fine), we noticed on several occasions that the DAS process vanishes without identified cause, followed by an instance (which runs on the same host as the DAS).
Although there is another instance, which handles the next requests, and the crashed instance eventually restarts, this is of course detrimental to the QoS, not to mention that remote administration is not possible until the DAS has been restarted manually.

I haven't found something similar in the arcguives of this forum. This is also my first support request around here. Eventually, the team is still relatively new to Glassfish (12 months), so I may not be looking at the correct logs.

Thanks in advance if you can point me in the correct direction.

    J.

Here are some logs (from the second instance's [i]server.log[/i]):

[#|2009-05-22T13:40:59.644+0200|INFO|sun-appserver2.1|ShoalLogger|_ThreadID=13;_ThreadName=ViewWindowThread;|GMS View Change Received for group mycluster : Members in view for (before change analysis) are :
1: MemberId: server, MemberType: SPECTATOR, Address: urn:jxta:uuid-59616261646162614A787461503250332DE658F932AB436995B78E0CB3E080DA03 <= IT IS THE DAS
2: MemberId: instance1, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250337987FC1134E54090AB24B0C9E01AD7DF03
3: MemberId: instance2, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033C441652F131349569476E023E060DFA903
...
|#]
[#|2009-05-22T13:40:59.645+0200|INFO|sun-appserver2.1|ShoalLogger|_ThreadID=13;_ThreadName=ViewWindowThread;IN_DOUBT_EVENT;|Analyzing new membership snapshot received as part of event : IN_DOUBT_EVENT|#]
[#|2009-05-22T13:40:59.645+0200|INFO|sun-appserver2.1|ShoalLogger|_ThreadID=13;_ThreadName=ViewWindowThread;server;|gms.failureSuspectedEventReceived|#]
[#|2009-05-22T13:40:59.649+0200|INFO|sun-appserver2.1|ShoalLogger|_ThreadID=34;_ThreadName=com.sun.enterprise.ee.cms.impl.common.Router Thread;server;|Sending FailureSuspectedSignals to registered Actions. Member:server...|#]
[#|2009-05-22T13:41:01.728+0200|INFO|sun-appserver2.1|ShoalLogger|_ThreadID=13;_ThreadName=ViewWindowThread;|GMS View Change Received for group mycluster : Members in view for (before change analysis) are :
<= NOTICE THE DAS HAS VANISHED
1: MemberId: instance1, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250337987FC1134E54090AB24B0C9E01AD7DF03
2: MemberId: instance2, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033C441652F131349569476E023E060DFA903
|#]
...

A few[#|2009-05-22T13:41:07.853+0200|INFO|sun-appserver2.1|ShoalLogger|_ThreadID=13;_ThreadName=ViewWindowThread;IN_DOUBT_EVENT;|Analyzing new membership snapshot received as part of event : IN_DOUBT_EVENT|#]
[#|2009-05-22T13:41:07.853+0200|INFO|sun-appserver2.1|ShoalLogger|_ThreadID=13;_ThreadName=ViewWindowThread;instance1;|gms.failureSuspectedEventReceived|#]
[#|2009-05-22T13:41:07.854+0200|INFO|sun-appserver2.1|ShoalLogger|_ThreadID=34;_ThreadName=com.sun.enterprise.ee.cms.impl.common.Router Thread;instance1;|Sending FailureSuspectedSignals to registered Actions. Member:instance1...|#]
[#|2009-05-22T13:41:09.870+0200|INFO|sun-appserver2.1|ShoalLogger|_ThreadID=13;_ThreadName=ViewWindowThread;|GMS View Change Received for group m2m-cluster : Members in view for (before change analysis) are :
<= Now the INSTANCE has vanished
1: MemberId: instance2, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033C441652F131349569476E023E060DFA903
|#]
[Message sent by forum member 'jduprez' (jduprez)]

http://forums.java.net/jive/thread.jspa?messageID=347649