users@shoal.java.net

RE: [Shoal-Users] Still not sure it's working

From: Mike Wannamaker <mwannama_at_opentext.com>
Date: Wed, 2 Jul 2008 11:53:22 -0400

Sorry for the late reply, yesterday was Canada Day and thus was a
holiday.

 

Yes all instances are running on a single machine.

Machine is Windows XP 64 bit.

JVM is Sun JVM 1.5.0_15

I am running this through IntelliJ IDEA.

Windows Firewall is turned off.

I do have Symantec Antivirus. Do you mean disable the Auto-Protect? Or
do you mean shutdown the whole antivirus?

 

10.6.2.89 is my network card.

192.168.111.1 is VMware VMNet8

192.168.138.1 is VMware VMNet1

 

I am NOT running these instances inside a VMware instance; they are
running on my main machine.

I'm not sure what you mean are all started concurrently? I run these as
applications from within IntelliJ which does this

 

D:\JDKS\jdk1.5.0_15\bin\java -Didea.launcher.port=7546
"-Didea.launcher.bin.path=C:\Program Files (x86)\JetBrains\IntelliJ IDEA
7.0.3\bin" -Dfile.encoding=windows-1252 -classpath
"D:\JDKS\jdk1.5.0_15\jre\lib\charsets.jar;D:\JDKS\jdk1.5.0_15\jre\lib\de
ploy.jar;D:\JDKS\jdk1.5.0_15\jre\lib\javaws.jar;D:\JDKS\jdk1.5.0_15\jre\
lib\jce.jar;D:\JDKS\jdk1.5.0_15\jre\lib\jsse.jar;D:\JDKS\jdk1.5.0_15\jre
\lib\plugin.jar;D:\JDKS\jdk1.5.0_15\jre\lib\rt.jar;D:\JDKS\jdk1.5.0_15\j
re\lib\ext\dnsns.jar;D:\JDKS\jdk1.5.0_15\jre\lib\ext\localedata.jar;D:\J
DKS\jdk1.5.0_15\jre\lib\ext\sunjce_provider.jar;D:\JDKS\jdk1.5.0_15\jre\
lib\ext\sunpkcs11.jar;D:\Development\shoaltest\out\production\SMessage;D
:\Development\shoaltest\libs\appia\appia-3.2.4.jar;D:\Development\shoalt
est\libs\jgroups\jgroups-all.jar;D:\Development\shoaltest\libs\log4j\log
4j.jar;D:\Development\shoaltest\libs\shoal\shoal-gms.jar;D:\Development\
shoaltest\libs\shoal\jxta.jar;C:\Program Files (x86)\JetBrains\IntelliJ
IDEA 7.0.3\lib\idea_rt.jar"
com.intellij.rt.execution.application.AppMain
com.opentext.shoal.SendMessageSample SERVER-3

 

I DO see the failure suspect for SERVER-3 in the log snippet?

 

30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved

INFO: Analyzing new membership snapshot received as part of event :
IN_DOUBT_EVENT

30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals

INFO: gms.failureSuspectedEventReceived

30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction

INFO: Sending FailureSuspectedSignals to registered Actions.
Member:SERVER-3...

30-Jun-2008 02:16:57 PM DEBUG [pool-1-thread-4]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-3
>> FailureSuspectedSignalImpl @ 30/06/08 2:00 PM - [RCS_CLUSTER-false]:
(Hashtable:[(String:server.name)<-->(String:SERVER-3),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])

MEMBERS: (ArrayList:[mwana0061/10.6.2.89, mwana0061/10.6.2.89,
mwana0061/10.6.2.89])

 

What is weird is the one for SERVER-1 which was not shutdown and is
still running?

 

 

________________________________

From: Shreedhar.Ganapathy_at_Sun.COM [mailto:Shreedhar.Ganapathy_at_Sun.COM]
Sent: June 30, 2008 3:00 PM
To: users_at_shoal.dev.java.net
Subject: Re: [Shoal-Users] Still not sure it's working

 

Hi Mike
Yes this is indeed a new problem. I hope this is not different
snippets but a continuous log snippet. What seems strange in this pasted
output is that there is no failure suspected signal (in doubt event) for
Server-3 ? Is this what you see? There is the suspect event for
server-1.

Some questions: Are all instances on the same machine? The interface
addresses dont seem to be all in the same subnet and/or it appears to be
different networks in a multihome machine environment (I see 10.6.2.89
and 192.168.111.1 and 192.168.138.1).
Are all instances started concurrently?

Do you have any antivirus or firewalls running in your machine(s) ? If
yes, can you disable them and see if communications and events happen
correctly?

Thanks
Shreedhar



Mike Wannamaker wrote:

Okay tested when shutting down a non groupleader. I do see suspect and
failure notifications.

 

However, you might not like this; I also see something that is very
strange and disturbing.

 

I start SERVER-1 (GROUPLEADER), SERVER-2, and SERVER-3.

 

Shutdown SERVER-3, get correct messages in SERVER-1 and mostly in
SERVER-2, but I also get a FailureSuspect for SERVER-1 in SERVER-2
window.

This might be okay if I got a notification that the node was back, but I
don't and it is still running. Started SERVER-3 and see SERVER-1 in the
list and it gets notifications as well.

 

I tried again shutdown the newly running SERVER-3 and I get the same
results so it seems fully reproducible.

 

 

 

Here is the output for SERVER-2

 

30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens

INFO: GMS View Change Received for group RCS_CLUSTER : Members in view
for (before change analysis) are :

1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A1617
79845B03

2: MemberId: SERVER-3, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD54C54AB0D7A640E493A5C6CE42
7A3CE203

3: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC
8BFEC603

 

30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved

INFO: Analyzing new membership snapshot received as part of event :
IN_DOUBT_EVENT

30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals

INFO: gms.failureSuspectedEventReceived

30-Jun-2008 2:16:57 PM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction

INFO: Sending FailureSuspectedSignals to registered Actions.
Member:SERVER-3...

30-Jun-2008 02:16:57 PM DEBUG [pool-1-thread-4]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-3
>> FailureSuspectedSignalImpl @ 30/06/08 2:00 PM - [RCS_CLUSTER-false]:
(Hashtable:[(String:server.name)<-->(String:SERVER-3),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])

MEMBERS: (ArrayList:[mwana0061/10.6.2.89, mwana0061/10.6.2.89,
mwana0061/10.6.2.89])

30-Jun-2008 2:16:57 PM com.sun.enterprise.jxtamgmt.HealthMonitor
isConnected

INFO: Checking for machine status for network interface :
tcp://10.6.2.89:9701

30-Jun-2008 2:16:57 PM com.sun.enterprise.jxtamgmt.HealthMonitor
isConnected

INFO: Checking for machine status for network interface :
tcp://192.168.111.1:9701

30-Jun-2008 2:16:57 PM com.sun.enterprise.jxtamgmt.HealthMonitor
isConnected

INFO: Checking for machine status for network interface :
tcp://192.168.138.1:9701

30-Jun-2008 2:17:27 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens

INFO: GMS View Change Received for group RCS_CLUSTER : Members in view
for (before change analysis) are :

1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A1617
79845B03

2: MemberId: SERVER-3, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD54C54AB0D7A640E493A5C6CE42
7A3CE203

3: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC
8BFEC603

 

30-Jun-2008 2:17:27 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved

INFO: Analyzing new membership snapshot received as part of event :
IN_DOUBT_EVENT

30-Jun-2008 2:17:27 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addInDoubtMemberSignals

INFO: gms.failureSuspectedEventReceived

30-Jun-2008 2:17:27 PM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureSuspectedAction

INFO: Sending FailureSuspectedSignals to registered Actions.
Member:SERVER-1...

30-Jun-2008 02:17:27 PM DEBUG [pool-1-thread-4]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-1
>> FailureSuspectedSignalImpl @ 30/06/08 1:59 PM - [RCS_CLUSTER-false]:
(Hashtable:[(String:server.name)<-->(String:SERVER-1),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])

MEMBERS: (ArrayList:[mwana0061/10.6.2.89, mwana0061/10.6.2.89,
mwana0061/10.6.2.89])

30-Jun-2008 2:17:30 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
getMemberTokens

INFO: GMS View Change Received for group RCS_CLUSTER : Members in view
for (before change analysis) are :

1: MemberId: SERVER-2, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FD0D4B867250FF460C9B539A1617
79845B03

2: MemberId: SERVER-1, MemberType: CORE, Address:
urn:jxta:uuid-2F39FF376B6A43E3905DAFC81B7D02FDB946A28335F0413BBF73B77CCC
8BFEC603

 

30-Jun-2008 2:17:30 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
newViewObserved

INFO: Analyzing new membership snapshot received as part of event :
FAILURE_EVENT

30-Jun-2008 2:17:30 PM com.sun.enterprise.ee.cms.impl.jxta.ViewWindow
addFailureSignals

INFO: The following member has failed: SERVER-3

30-Jun-2008 2:17:30 PM com.sun.enterprise.ee.cms.impl.common.Router
notifyFailureNotificationAction

INFO: Sending FailureNotificationSignals to registered Actions. Member:
SERVER-3...

30-Jun-2008 02:17:30 PM DEBUG [pool-1-thread-4]
com.opentext.ecm.services.smessage.impl.shoal.SignalLogger - - SERVER-3
>> FailureNotificationSignalImpl @ 30/06/08 2:00 PM -
[RCS_CLUSTER-false]:
(Hashtable:[(String:server.name)<-->(String:SERVER-3),
(String:local.host)<-->(Inet4Address:mwana0061/10.6.2.89)])SERVER-3

MEMBERS: (ArrayList:[mwana0061/10.6.2.89, mwana0061/10.6.2.89])

 

________________________________

From: Shreedhar.Ganapathy_at_Sun.COM [mailto:Shreedhar.Ganapathy_at_Sun.COM]
Sent: June 30, 2008 2:07 PM
To: users_at_shoal.dev.java.net
Subject: Re: [Shoal-Users] Still not sure it's working

 

Thats correct. Yes I should not mix up the provider terminology versus
GMS terminology.
Thanks
Shreedhar

Mike Wannamaker wrote:

When you say a non-master do you mean when a server is shutdown that is
not the groupleader?

 

________________________________

From: Shreedhar.Ganapathy_at_Sun.COM [mailto:Shreedhar.Ganapathy_at_Sun.COM]
Sent: June 30, 2008 1:47 PM
To: users_at_shoal.dev.java.net
Subject: Re: [Shoal-Users] Still not sure it's working

 

Hi Mike
This is a recent known issue occuring when master failure occurs. I
don't see a Shoal issue on this yet but our QE has filed an internal
issue on this behavior. I will post an issue in the Shoal tracker later
today with your details.

Can you confirm if behavior is okay when a non-master member fails?

Thanks
Shreedhar