Sorry for the late reply, yesterday was Canada Day and thus was a
holiday.
Yes all instances are running on a single machine.
Machine is Windows XP 64 bit.
JVM is Sun JVM 1.5.0_15
I am running this through IntelliJ IDEA.
Windows Firewall is turned off.
I do have Symantec Antivirus. Do you mean disable the Auto-Protect? Or
do you mean shutdown the whole antivirus?
10.6.2.89 is my network card.
192.168.111.1 is VMware VMNet8
192.168.138.1 is VMware VMNet1
I am NOT running these instances inside a VMware instance; they are
running on my main machine.
I'm not sure what you mean are all started concurrently? I run these as
applications from within IntelliJ which does this
D:\JDKS\jdk1.5.0_15\bin\java -Didea.launcher.port=7546
"-Didea.launcher.bin.path=C:\Program Files (x86)\JetBrains\IntelliJ IDEA
7.0.3\bin" -Dfile.encoding=windows-1252 -classpath
"D:\JDKS\jdk1.5.0_15\jre\lib\charsets.jar;D:\JDKS\jdk1.5.0_15\jre\lib\de
ploy.jar;D:\JDKS\jdk1.5.0_15\jre\lib\javaws.jar;D:\JDKS\jdk1.5.0_15\jre\
lib\jce.jar;D:\JDKS\jdk1.5.0_15\jre\lib\jsse.jar;D:\JDKS\jdk1.5.0_15\jre
\lib\plugin.jar;D:\JDKS\jdk1.5.0_15\jre\lib\rt.jar;D:\JDKS\jdk1.5.0_15\j
re\lib\ext\dnsns.jar;D:\JDKS\jdk1.5.0_15\jre\lib\ext\localedata.jar;D:\J
DKS\jdk1.5.0_15\jre\lib\ext\sunjce_provider.jar;D:\JDKS\jdk1.5.0_15\jre\
lib\ext\sunpkcs11.jar;D:\Development\shoaltest\out\production\SMessage;D
:\Development\shoaltest\libs\appia\appia-3.2.4.jar;D:\Development\shoalt
est\libs\jgroups\jgroups-all.jar;D:\Development\shoaltest\libs\log4j\log
4j.jar;D:\Development\shoaltest\libs\shoal\shoal-gms.jar;D:\Development\
shoaltest\libs\shoal\jxta.jar;C:\Program Files (x86)\JetBrains\IntelliJ
IDEA 7.0.3\lib\idea_rt.jar"
com.intellij.rt.execution.application.AppMain
com.opentext.shoal.SendMessageSample SERVER-3
I DO see the failure suspect for SERVER-3 in the log snippet?
30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event :
IN_DOUBT_EVENT
30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals
INFO: gms.failureSuspectedEventReceived
30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction
INFO: Sending FailureSuspectedSignals to registered Actions.
Member:SERVER-3...
30-Jun-2008 02:16:57 PM DEBUG [pool-1-thread-4]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-3
>> FailureSuspectedSignalImpl @ 30/06/08 2:00 PM - [RCS_CLUSTER-false]:
(Hashtable:[(String:server.name)<-->(String:SERVER-3),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])
MEMBERS: (ArrayList:[mwana0061/10.6.2.89, mwana0061/10.6.2.89,
mwana0061/10.6.2.89])
What is weird is the one for SERVER-1 which was not shutdown and is
still running?
________________________________
From: Shreedhar.Ganapathy_at_Sun.COM [mailto:Shreedhar.Ganapathy_at_Sun.COM]
Sent: June 30, 2008 3:00 PM
To: users_at_shoal.dev.java.net
Subject: Re: [Shoal-Users] Still not sure it's working
Hi Mike
Yes this is indeed a new problem. I hope this is not different
snippets but a continuous log snippet. What seems strange in this pasted
output is that there is no failure suspected signal (in doubt event) for
Server-3 ? Is this what you see? There is the suspect event for
server-1.
Some questions: Are all instances on the same machine? The interface
addresses dont seem to be all in the same subnet and/or it appears to be
different networks in a multihome machine environment (I see 10.6.2.89
and 192.168.111.1 and 192.168.138.1).
Are all instances started concurrently?
Do you have any antivirus or firewalls running in your machine(s) ? If
yes, can you disable them and see if communications and events happen
correctly?
Thanks
Shreedhar
Mike Wannamaker wrote:
Okay tested when shutting down a non groupleader. I do see suspect and
failure notifications.
However, you might not like this; I also see something that is very
strange and disturbing.
I start SERVER-1 (GROUPLEADER), SERVER-2, and SERVER-3.
Shutdown SERVER-3, get correct messages in SERVER-1 and mostly in
SERVER-2, but I also get a FailureSuspect for SERVER-1 in SERVER-2
window.
This might be okay if I got a notification that the node was back, but I
don't and it is still running. Started SERVER-3 and see SERVER-1 in the
list and it gets notifications as well.
I tried again shutdown the newly running SERVER-3 and I get the same
results so it seems fully reproducible.
Here is the output for SERVER-2
30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group RCS_CLUSTER : Members in view
for (before change analysis) are :
1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A1617
79845B03
2: MemberId: SERVER-3, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD54C54AB0D7A640E493A5C6CE42
7A3CE203
3: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC
8BFEC603
30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event :
IN_DOUBT_EVENT
30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals
INFO: gms.failureSuspectedEventReceived
30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction
INFO: Sending FailureSuspectedSignals to registered Actions.
Member:SERVER-3...
30-Jun-2008 02:16:57 PM DEBUG [pool-1-thread-4]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-3
>> FailureSuspectedSignalImpl @ 30/06/08 2:00 PM - [RCS_CLUSTER-false]:
(Hashtable:[(String:server.name)<-->(String:SERVER-3),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])
MEMBERS: (ArrayList:[mwana0061/10.6.2.89, mwana0061/10.6.2.89,
mwana0061/10.6.2.89])
30-Jun-2008 2:16:57 PM com.sun.enterprise.jxtamgmt.HealthMonitor
isConnected
INFO: Checking for machine status for network interface :
tcp://10.6.2.89:9701
30-Jun-2008 2:16:57 PM com.sun.enterprise.jxtamgmt.HealthMonitor
isConnected
INFO: Checking for machine status for network interface :
tcp://192.168.111.1:9701
30-Jun-2008 2:16:57 PM com.sun.enterprise.jxtamgmt.HealthMonitor
isConnected
INFO: Checking for machine status for network interface :
tcp://192.168.138.1:9701
30-Jun-2008 2:17:27 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group RCS_CLUSTER : Members in view
for (before change analysis) are :
1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A1617
79845B03
2: MemberId: SERVER-3, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD54C54AB0D7A640E493A5C6CE42
7A3CE203
3: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC
8BFEC603
30-Jun-2008 2:17:27 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event :
IN_DOUBT_EVENT
30-Jun-2008 2:17:27 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals
INFO: gms.failureSuspectedEventReceived
30-Jun-2008 2:17:27 PM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction
INFO: Sending FailureSuspectedSignals to registered Actions.
Member:SERVER-1...
30-Jun-2008 02:17:27 PM DEBUG [pool-1-thread-4]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-1
>> FailureSuspectedSignalImpl @ 30/06/08 1:59 PM - [RCS_CLUSTER-false]:
(Hashtable:[(String:server.name)<-->(String:SERVER-1),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])
MEMBERS: (ArrayList:[mwana0061/10.6.2.89, mwana0061/10.6.2.89,
mwana0061/10.6.2.89])
30-Jun-2008 2:17:30 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens
INFO: GMS View Change Received for group RCS_CLUSTER : Members in view
for (before change analysis) are :
1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A1617
79845B03
2: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC
8BFEC603
30-Jun-2008 2:17:30 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved
INFO: Analyzing new membership snapshot received as part of event :
FAILURE_EVENT
30-Jun-2008 2:17:30 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addFailureSignals
INFO: The following member has failed: SERVER-3
30-Jun-2008 2:17:30 PM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureNotificationAction
INFO: Sending FailureNotificationSignals to registered Actions. Member:
SERVER-3...
30-Jun-2008 02:17:30 PM DEBUG [pool-1-thread-4]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-3
>> FailureNotificationSignalImpl @ 30/06/08 2:00 PM -
[RCS_CLUSTER-false]:
(Hashtable:[(String:server.name)<-->(String:SERVER-3),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])SERVER-3
MEMBERS: (ArrayList:[mwana0061/10.6.2.89, mwana0061/10.6.2.89])
________________________________
From: Shreedhar.Ganapathy_at_Sun.COM [mailto:Shreedhar.Ganapathy_at_Sun.COM]
Sent: June 30, 2008 2:07 PM
To: users_at_shoal.dev.java.net
Subject: Re: [Shoal-Users] Still not sure it's working
Thats correct. Yes I should not mix up the provider terminology versus
GMS terminology.
Thanks
Shreedhar
Mike Wannamaker wrote:
When you say a non-master do you mean when a server is shutdown that is
not the groupleader?
________________________________
From: Shreedhar.Ganapathy_at_Sun.COM [mailto:Shreedhar.Ganapathy_at_Sun.COM]
Sent: June 30, 2008 1:47 PM
To: users_at_shoal.dev.java.net
Subject: Re: [Shoal-Users] Still not sure it's working
Hi Mike
This is a recent known issue occuring when master failure occurs. I
don't see a Shoal issue on this yet but our QE has filed an internal
issue on this behavior. I will post an issue in the Shoal tracker later
today with your details.
Can you confirm if behavior is okay when a non-master member fails?
Thanks
Shreedhar