Okay tested when shutting down a non groupleader. I do see suspect and
failure notifications.
However, you might not like this; I also see something that is very
strange and disturbing.
I start SERVER-1 (GROUPLEADER), SERVER-2, and SERVER-3.
Shutdown SERVER-3, get correct messages in SERVER-1 and mostly in
SERVER-2, but I also get a FailureSuspect for SERVER-1 in SERVER-2
window.
This might be okay if I got a notification that the node was back, but I
don't and it is still running. Started SERVER-3 and see SERVER-1 in the
list and it gets notifications as well.
I tried again shutdown the newly running SERVER-3 and I get the same
results so it seems fully reproducible.
Here is the output for SERVER-2
30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group RCS_CLUSTER : Members in view
for (before change analysis) are :
1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A1617
79845B03
2: MemberId: SERVER-3, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD54C54AB0D7A640E493A5C6CE42
7A3CE203
3: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC
8BFEC603
30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event :
IN_DOUBT_EVENT
30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals
INFO: gms.failureSuspectedEventReceived
30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction
INFO: Sending FailureSuspectedSignals to registered Actions.
Member:SERVER-3...
30-Jun-2008 02:16:57 PM DEBUG [pool-1-thread-4]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-3
>> FailureSuspectedSignalImpl @ 30/06/08 2:00 PM - [RCS_CLUSTER-false]:
(Hashtable:[(String:server.name)<-->(String:SERVER-3),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])
MEMBERS: (ArrayList:[mwana0061/10.6.2.89, mwana0061/10.6.2.89,
mwana0061/10.6.2.89])
30-Jun-2008 2:16:57 PM com.sun.enterprise.jxtamgmt.HealthMonitor
isConnected
INFO: Checking for machine status for network interface :
tcp://10.6.2.89:9701
30-Jun-2008 2:16:57 PM com.sun.enterprise.jxtamgmt.HealthMonitor
isConnected
INFO: Checking for machine status for network interface :
tcp://192.168.111.1:9701
30-Jun-2008 2:16:57 PM com.sun.enterprise.jxtamgmt.HealthMonitor
isConnected
INFO: Checking for machine status for network interface :
tcp://192.168.138.1:9701
30-Jun-2008 2:17:27 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group RCS_CLUSTER : Members in view
for (before change analysis) are :
1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A1617
79845B03
2: MemberId: SERVER-3, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD54C54AB0D7A640E493A5C6CE42
7A3CE203
3: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC
8BFEC603
30-Jun-2008 2:17:27 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event :
IN_DOUBT_EVENT
30-Jun-2008 2:17:27 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals
INFO: gms.failureSuspectedEventReceived
30-Jun-2008 2:17:27 PM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction
INFO: Sending FailureSuspectedSignals to registered Actions.
Member:SERVER-1...
30-Jun-2008 02:17:27 PM DEBUG [pool-1-thread-4]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-1
>> FailureSuspectedSignalImpl @ 30/06/08 1:59 PM - [RCS_CLUSTER-false]:
(Hashtable:[(String:server.name)<-->(String:SERVER-1),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])
MEMBERS: (ArrayList:[mwana0061/10.6.2.89, mwana0061/10.6.2.89,
mwana0061/10.6.2.89])
30-Jun-2008 2:17:30 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group RCS_CLUSTER : Members in view
for (before change analysis) are :
1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A1617
79845B03
2: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC
8BFEC603
30-Jun-2008 2:17:30 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event :
FAILURE_EVENT
30-Jun-2008 2:17:30 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addFailureSignals
INFO: The following member has failed: SERVER-3
30-Jun-2008 2:17:30 PM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureNotificationAction
INFO: Sending FailureNotificationSignals to registered Actions. Member:
SERVER-3...
30-Jun-2008 02:17:30 PM DEBUG [pool-1-thread-4]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-3
>> FailureNotificationSignalImpl @ 30/06/08 2:00 PM -
[RCS_CLUSTER-false]:
(Hashtable:[(String:server.name)<-->(String:SERVER-3),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])SERVER-3
MEMBERS: (ArrayList:[mwana0061/10.6.2.89, mwana0061/10.6.2.89])
________________________________
From: Shreedhar.Ganapathy_at_Sun.COM [mailto:Shreedhar.Ganapathy_at_Sun.COM]
Sent: June 30, 2008 2:07 PM
To: users_at_shoal.dev.java.net
Subject: Re: [Shoal-Users] Still not sure it's working
Thats correct. Yes I should not mix up the provider terminology versus
GMS terminology.
Thanks
Shreedhar
Mike Wannamaker wrote:
When you say a non-master do you mean when a server is shutdown that is
not the groupleader?
________________________________
From: Shreedhar.Ganapathy_at_Sun.COM [mailto:Shreedhar.Ganapathy_at_Sun.COM]
Sent: June 30, 2008 1:47 PM
To: users_at_shoal.dev.java.net
Subject: Re: [Shoal-Users] Still not sure it's working
Hi Mike
This is a recent known issue occuring when master failure occurs. I
don't see a Shoal issue on this yet but our QE has filed an internal
issue on this behavior. I will post an issue in the Shoal tracker later
today with your details.
Can you confirm if behavior is okay when a non-master member fails?
Thanks
Shreedhar