users@shoal.java.net

Re: Merging group issue

From: Tim <tim.shiu_at_ssc-ltd.com>
Date: Mon, 28 Nov 2011 10:05:09 +0800

Dear Joe,

Thanks for your explanation. Let me share my use on Shoal for my system.

In my use case, I will put the states of machines into the DSC for
handling failover.

e.g. a running machine is down, the backup machine will be notified and
take over the tasks that the failure machine is performing. Those
states, like running, failure, recovering, etc, will be stored in DSC to
share among the nodes in the group. There will be m + n machines in the
group where m indicate the number of machines running tasks and n is the
number of machines ready for backup.

There are 2 information I would like to get from DSC. 1) the states of
all machines in the group. 2) which backup machine is taking over
another normal machine

So, when a running machine A is failed, a failure recovery notification
will be sent to one of the backup machine B to take over the tasks and
the states of the failed machine and recover machine will be stored in
the DSC by the leaderwhen receiving failure notification. The states are
used by following resume action after the failed machine is returned to
normal.

It's great that Shoal has native ability to handle this failure case
even the failed machine is the leader node.

Unfortunately, if the failure is triggered by network problem, both
machine A and machine B will act as leader and received failure
notification to mark opposite machine as failed in DSC. If the network
becomes normal, the groups are merged and the DSC will be overwritten by
one of the node among the group. (assumed that there are only 2 machines
in the group and both machines are not backup machines)

So, I would like to ask:

1) if there is any specific notification signal will be sent after the
groups are merged? (I can see that join notification will be sent but is
it able to distinguish between the normal join action and merge action?)

2) can the native merging of DSC be stopped pragmatically that handle by
my custom merging logic?

3) is there any exposed notification for the updated of DSC that I can
add custom logic to determine if the update can be proceeded?

Sorry, the essay is quite long. Thanks a lot.

Regards,
Tim.Shiu

On 23/11/2011 0:10, Joseph Fialli wrote:
>
>
> On 11/22/11 9:28 AM, Tim Shiu wrote:
>> Dear Joe,
>>
>> Thanks a lot.
>> I checkout the project from SVN and tested that they can merge
>> together now.
>>
>> After that, I would like to ask one more question.
>> Is there any mechanism to merge the dsc between these nodes to
>> maintain the data in dsc is most updated?
> The Master of the GMS group synchronizes its distributed state context
> with the other members.
> For your isolation at the beginning case, both instances are initially
> masters of their one member groups.
> When the isolated instances find each other, a master collision
> resolution algorithm resolves which
> one will be master. The instance that is not the master should
> synchronize its dsc with the master.
> The master will distribute its latest dsc with all other members.
>
> -Joe
>> Or it will just pick one dsc in a node and distribute among the nodes.
>>
>> Thank you.
>>
>> Regards,
>> Tim.Shiu
>>
>> 引述 Joseph Fialli <joe.fialli_at_oracle.com>:
>>
>> > Tim,
>> >
>> > There were recent bug fixes for instances rejoining cluster checked
>> > in last Thursday.
>> > Using the gms-transport-module branch (that is latest shoal branch),
>> > I was able to confirm two instances
>> > on different machines finding each other after being started up on
>> > isolated network and then having
>> > the network reconnected after startup.
>> > (Simulated loss of network connectivity by running ifconfig
>> > <networkinterface> down followed
>> > 80 seconds later by an ifconfig up to the same network interface.)
>> >
>> > The same fix was checked into the shoal trunk last Thursday.
>> >
>> > -Joe Fialli, Oracle Corp.
>> >
>> > P.S.
>> > Just in case you do not know how to checkout and build shoal
>> workspace,
>> > there are instructions on how to check out the trunk or a branch and
>> > build it at
>> > the following link: http://shoal.java.net/HowToBuildSource.html.
>> >
>> > On 11/17/11 8:42 PM, Tim wrote:
>> >> Dear Joe,
>> >>
>> >> Thanks for your reply.
>> >>
>> >> I have already set both machines with the same multicast address and
>> >> port (by using the property parameter by calling
>> >> GMSFactory.startGMSModule) and they are already under the same
>> >> subnet. Unfortunately, they still cannot detect each other after the
>> >> network connected. Do I miss any setting?
>> >>
>> >> The following is the program fragment to join the group.
>> >>
>> >> Properties props = new Properties();
>> >> props.put(ServiceProviderConfigurationKeys.LOOPBACK.toString(),
>> >> "true");
>> >>
>> >>
>> props.put(ServiceProviderConfigurationKeys.FAILURE_DETECTION_TIMEOUT.toString(),
>> >> "500");
>> >>
>> >>
>> props.put(ServiceProviderConfigurationKeys.FAILURE_VERIFICATION_TIMEOUT.toString(),
>> >> "500");
>> >>
>> >>
>> props.put(ServiceProviderConfigurationKeys.FAILURE_DETECTION_RETRIES.toString(),
>> >> "2");
>> >>
>> props.put(ServiceProviderConfigurationKeys.MULTICASTADDRESS.toString(),
>> >> "228.0.0.1");
>> >>
>> props.put(ServiceProviderConfigurationKeys.MULTICASTPORT.toString(),
>> >> "9800");
>> >> GroupManagementService gms = (GroupManagementService)
>> >> GMSFactory.startGMSModule("MACHINEA", "TESTGROUP",
>> >> MemberType.CORE, props);
>> >> gms.join();
>> >>
>> >> Thanks for your help.
>> >>
>> >> Regards,
>> >> Tim.Shiu
>> >>
>> >> On 18/11/2011 3:22, Joseph Fialli wrote:
>> >>> Tim,
>> >>>
>> >>> In addition to the same group name, the GMS clients would also have
>> >>> to be using the
>> >>> same multicast group address and multicast port.
>> >>> Lastly, Machine A and B would have to be on the same subnet and UDP
>> >>> multicast
>> >>> needs to be enabled for the network and possible switches/routers.
>> >>>
>> >>> They would find each other over UDP multicast and form a group when
>> >>> network connectivity returns.
>> >>>
>> >>> -Joe Fialli
>> >>>
>> >>> On 11/17/11 5:46 AM, tim.shiu_at_ssc-ltd.com wrote:
>> >>>> Dear All,
>> >>>>
>> >>>> I would like to ask if there is any mechanism in Shoal that can
>> merge 2
>> >>>> separate groups (with same group name) into 1 after they join to
>> the
>> >>>> same network?
>> >>>>
>> >>>> e.g.
>> >>>> Machine A and B join group separately without connect to
>> network. After
>> >>>> they create their own group with the same name, plug the network
>> wire
>> >>>> and connect them together. Will they merge into the same group.
>> >>>>
>> >>>> Thanks.
>> >>>>
>> >>>> Regards,
>> >>>> Tim.Shiu
>> >>>
>> >>>
>> >
>>
>
>