users@glassfish.java.net

Re: Cluster sometimes stops without any exception

From: Fialli Joe <joe.fialli_at_oracle.com>
Date: Wed, 28 Nov 2012 10:42:35 -0500

I am uncertain what you are trying to illustrate with this email;
however, I do have a simple solution for you
to use if your development mode is such that all clustered instances for
a cluster are all on the same machine.

Use the following command to limit the multicast to only being local if
all clustered instances for a cluster
are on one machine.

% asadmin create-cluster --properties "GMS_MULTICAST_TIME_TO_LIVE=0"
.... <cluster-name>

The above results in a cluster that can only be seen on that single
machine.
See asadmin create-cluster --help for a description for this property.
The above
limits udp multicast broadcast by shoal gms to be only on the local machine.
This mode is sufficient for development/testing where all clustered
instances for a cluster are on same machine.
It definitely not be used in a development or production environment
where the clustered instances are spread across multiple machines.
(but one would not expect a production environment to have hundreds of
clusters with same name).

Limiting the scope of shoal gms udp multicast broadcast works around the
following constraint not being maintained in your environment.
Namely, the following tuple must be unique within the subnet for each
unique cluster.

cluster name, multicast group address, multicast port

It is not a bug in the system if the above constraint is violated and
you observe multiple instances from different machines
seeing each other.

-Joe


On 11/28/12 3:03 AM, forums_at_java.net wrote:
> I've tried to reproduce this behavior manually. I've got my hands on two
> machines that have glassfish installed and configured the same way. I've
> manually changed the GMS address and port on one of the clusters. Then
> I've
> restarted the clusters a couple of times. I couldn't make the other
> cluster
> stop (or start). When stopping them, i would get a normal log.
> However, when
> starting one of them, i would get the following, expected log:
> [#|2012-11-28T00:51:16.879-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1092:
>
> GMS View Change Received for group: POD_Processing_Cl01 : Members in
> view for
> JOINED_AND_READY_EVENT(before change analysis) are : 1: MemberId:
> POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.20.118:9095:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 2: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.20.194:9137:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 3: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.20.118:9104:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 4: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.20.194:9106:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 5: MemberId: server, MemberType: SPECTATOR, Address:
> 10.220.20.194:9143:228.9.103.196:16084:POD_Processing_Cl01:server |#]
> [#|2012-11-28T00:51:16.879-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1016:
>
> Analyzing new membership snapshot received as part of event:
> JOINED_AND_READY_EVENT for member: POD_Processing_Cl01_ins01 of group:
> POD_Processing_Cl01|#]
> [#|2012-11-28T00:51:16.879-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1025:
>
> Adding Joined And Ready member: POD_Processing_Cl01_ins01 group:
> POD_Processing_Cl01 StartupState: INSTANCE_STARTUP |#]
> [#|2012-11-28T00:51:32.847-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1092:
>
> GMS View Change Received for group: POD_Processing_Cl01 : Members in
> view for
> JOINED_AND_READY_EVENT(before change analysis) are : 1: MemberId:
> POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.20.118:9095:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 2: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.20.194:9137:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 3: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.20.118:9104:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 4: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.20.194:9106:228.9.103.196:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 5: MemberId: server, MemberType: SPECTATOR, Address:
> 10.220.20.194:9143:228.9.103.196:16084:POD_Processing_Cl01:server |#]
> [#|2012-11-28T00:51:32.847-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1016:
>
> Analyzing new membership snapshot received as part of event:
> JOINED_AND_READY_EVENT for member: POD_Processing_Cl01_ins02 of group:
> POD_Processing_Cl01|#]
> [#|2012-11-28T00:51:32.847-0800|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1025:
>
> Adding Joined And Ready member: POD_Processing_Cl01_ins02 group:
> POD_Processing_Cl01 StartupState: INSTANCE_STARTUP |#] Still, only one
> cluster is up. Not both of them. Are there any other configurations that
> could have been done in the first case, when both clusters came down?
> Thank
> you for your time and patience.
>
> --
>
> [Message sent by forum member 'sebigavril']
>
> View Post: http://forums.java.net/node/892700
>
>