Re: [Shoal-Dev] About sailfin issue #484

From: Joseph Fialli <Joseph.Fialli_at_Sun.COM>
Date: Thu, 04 Jun 2009 15:40:46 -0400

Bongjae Chang wrote:
> Hi Joe,
>
>
>
<deleted resolved issue>
> I agree. So now I understand why the changes have given the master a special treatment. And I could also understand why WATCH_DOG was needed after I had seen the glassfish issue #8308. :-)
>
> About the glassfish issue #8308, I have a question.
>
> If the server which uses Shoal doesn't have a node agent which supports WATCH_DOG, the FAILURE event could be lost at this case, couldn't it?
>
The FAILURE event is never sent because the instance is restarted in a
shorter period of time than GMS heartbeat failure detection can detect
the failure.
> Then the new master ought to notify the old master's FAILURE. is it Right?
>
Here is the dilema.

If an instance fails and restarts, the Shoal system is detecting the
RESTARTED instance independent of WATCHDOG.
At the time one detects that an instance has RESTARTED, it seems
confusing to send out the fact that
a previous instantiation of the issue in the past had failed.

Once an instance has restarted, one could easily confuse the FAILURE
event with the restarted instance.
The Shoal internals is differentiating between the two different
instantiations by consulting the
start time of the instance. (Each heartbeat has a START TIME that
records the time the instance joined the group.)
However, that is not typical Shoal protocol for users of Shoal API.
Thus it is better to miss the FAILURE event and just log that the
FAILURE event was missed.
(as documented under glassfish issue 8308.)

Here is the sequence that is occurring.

                                Instance A (started at time XX)
                                Instance A (fails at time YY)
                                Instance A (restarted at time ZZ)


Timeline
---------+---------------------------+-------------------------------+--------
         XX
YY ZZ

It is not ambiguous to send a FAILURE notification between times YY and ZZ.
However, once one hits time ZZ, it is ambiguous whether the FAILURE
applies to Instance A started at XX
or instance A started at ZZ. Also, what benefit is it to know that
Instance A started at XX failed if Instance A
has restarted at time ZZ. The above occurs anytime that the time
between ZZ- YY is less than amount
of time GMS heartbeat failure detection needs to detect failure of an
instance. As of this writing, only the Glassfish
Nodeagent is known to restart an instance of a cluster in shorter period
time than glassfish
default GMS heartbeat failure detection time.

Here are the log messages detecting that the FAILURE event was never
sent due to fast restart.
These log messages are recorded in MasterNode(typically the DAS) when it
receives GMS heartbeat STARTING from Instance
A started at time ZZ and the system realizes the heartbeat is from a
different instance A than the
last recorded one (that had started at time XX).

[#|...|WARNING|sun-glassfish-comms-server1.5|ShoalLogger|...;
Instance n2c1m4 was restarted at 4:13:19 PM PST on Feb 4, 2009.|#]

[#|...|WARNING|sun-glassfish-comms-server1.5|ShoalLogger|...;
Note that there was no Failure notification sent out for this instance
that was
previously started at 4:11:31 PM PST on Feb 4, 2009|#]

Complete description at
https://glassfish.dev.java.net/issues/show_bug.cgi?id=8308

Hope this explains why sending late FAILURE notifications is only logged
and that only
WATCHDOG capability is able to send a timely FAILURE event for an
instance that dies
and is quicly restarted by an external agent (in glassfish's case the
NodeAgent.)

-Joe

> Thanks in advance.
>
> PS) Didn't you join the Javaone events with Shreedhar? Unfortunately, I couldn't join there this year. But I wish that I will attend next Javaone and meet you and many Shoal's users and devs next year!
>
> --
> Bongjae Chang
>
>
> ----- Original Message -----
> From: "Joseph Fialli" <Joseph.Fialli_at_Sun.COM>
> To: <dev_at_shoal.dev.java.net>
> Sent: Tuesday, June 02, 2009 6:03 AM
> Subject: Re: [Shoal-Dev] About sailfin issue #484
>
>
>
>> Bongjae,
>>
>> See my comments inline below.
>>
>>
>> Bongjae Chang wrote:
>>
>>> Hi,
>>> I have a question about sailfin issue #484 relating to
>>> MasterNode#processMasterNodeQuery()'s changes.
>>> I tried to test the master's failure.
>>> This test is like sailfin issue #484.
>>> i.g. the master dies and comes back up quickly.
>>> It seems that the policy and behavior about the failed master has been
>>> changed from sailfin issue #484.
>>> The changes select a new master and notify a join notification about
>>> the old master in only new master.
>>> This result was not my expectaion because the old master didn't have a
>>> failure state at other members.
>>>
>> Please see the following glassfish issue concerning fast restart of a
>> failed instance.
>> https://glassfish.dev.java.net/issues/show_bug.cgi?id=8308
>>
>> To summarize, GMS heartbeat detection (default of 7.5 seconds in
>> Glassfish) is not able to detect
>> and report FAILURE event when the glassfish NodeAgent automatically
>> restarts an instance in less than
>> 7.5 seconds. The instance has truely failed regardless if it is reported
>> by a GMS failure event.
>> It is not possible to send out a GMS FAILURE event once the instance has
>> already restarted.
>> That is what is discussed in much detail in glassfish issue 8308 and the
>> ability to augment GMS failure
>> detection when an external agent is restarting failed instances faster
>> than gms heartbeat detection.
>>
>> The restarted instance is missing all state that the previous Master
>> instance did have. It was a bug in sailfin 484 that the failure went
>> undetected.
>> It was not a policy change but a bug fix.
>>
>> Here is how GMS failure detection works at a high level.
>> - The MasterNode monitors all other instance heartbeats in a cluster for
>> failure.
>> - All other instances in the cluster monitor the MasterNode heartbeats
>> to check if it failed.
>>
>> Once the MasterNode is killed and comes back up quickly, ALL other
>> instances in the cluster
>> (not just the master node) will see a MasterNodeQuery. ALL OTHER
>> INSTANCES recognize the
>> former master node has restarted and that there is a need to recalculate
>> who is the new Master from the surviving cluster instances since the
>> newly restarted former master is missing all state
>> (which instances make up the cluster).
>> Only the surviving instances of the cluster have been keeping that
>> information and are quallified to be new Master.
>> Whichever instance is made the new Master (based on an algorithm that
>> all instances are applying to their list of instances making up the cluster)
>> all instances will agree on new Master.
>>
>> Only the newly elected Master sends out the join notification of the
>> restarted old Master instance. That was the fix that
>> was checked in for sailfin 484. All other instances of the cluster will
>> receive this join notification.
>>
>> I hope this explains the motivation behind the fix for sailfin 484.
>> It was not intended to be a policy change.
>>
>> -Joe
>>
>>
>>> I thought that the old master should keep master' role if the old
>>> master came back up quickly before others were aware of the old
>>> master's failure.
>>> And the changes are only notifying the old master's join notification
>>> in a new master.
>>> Assume that A, B and C are members and A is the master.
>>> When A dies and comes back quickly, B becomes to be a new master and B
>>> receives A's join notification. Maybe C doesn't receive A's join
>>> notification because A is not only failure member but also indoubt
>>> member. I think that C's behavior is right.
>>> Assume that A, B and C are members and A is the master again.
>>> When B dies and comes back quickly, both A and C doesn't receive join
>>> notifications because B is not indoubt member as well as failure
>>> member. I think that this behavior is also right.
>>> When the old master dies and rejoins the group quickly, the old master
>>> perhaps discovers the group's master. But the group doesn't have the
>>> master because the old master itself has been the group master. Then
>>> the old master which rejoins the group will wait for discovery time.
>>> During discovery time, maybe all members can't receive the group's
>>> event adequately.
>>> So is the new master selected in order to save discovery time instead
>>> of the old master?
>>> And should we give the old master's join notification special
>>> treatment when the old master dies and comes back?
>>> What do you think?
>>> Thanks!
>>> --
>>> Bongjae Cha
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe_at_shoal.dev.java.net
>> For additional commands, e-mail: dev-help_at_shoal.dev.java.net
>>
>>
>>
>>
>>