I have a 5 node cluster (2.11) and today the whole thing went down (this has
happened several times before). On two of the nodes and the DAS I see the
message below. The other nodes appeared to have quit logging. If I had to
guess I suffered some sort of networking failure because it seems weird to me
that the GMSÂ on 2 nodes and the DAS would suddenly have trouble
communicating with each other. However, that thought is predicated on a very
weak understanding of the GMS. I found the following blog very helpful
(
http://blogs.oracle.com/varunrupela/entry/notes_on_sailfin_cluster_failure)
but would also like any other thoughts / ideas you might have.
Thanks in advance.
[#|2012-01-19T09:35:21.180-0700|INFO|sun-appserver2.1|ShoalLogger|_ThreadID=13;_ThreadName=ViewWindowThread:ProdA;|GMS
View Change Received for group ProdA : Members in view for
IN_DOUBT_EVENT(before change analysis) are :
1: MemberId: ProdAInstance0, MemberType: CORE, Address: urn:jxta:uuid-**
2: MemberId: server, MemberType: SPECTATOR, Address: urn:jxta:uuid-**
3: MemberId: ProdAInstance2, MemberType: CORE, Address: urn:jxta:uuid-**
4: MemberId: ProdAInstance1, MemberType: CORE, Address: urn:jxta:uuid-**
5: MemberId: ProdAInstance3, MemberType: CORE, Address: urn:jxta:uuid-**
6: MemberId: ProdAInstance4, MemberType: CORE, Address: urn:jxta:uuid-**
--
[Message sent by forum member 'preston001']
View Post: http://forums.java.net/node/882842