Re: [Shoal-Dev] About sailfin issue #484

Hi Joe,

I understood it.

Thank you very much!

Bongjae Chang

> Bongjae Chang wrote:
>> Hi Joe,
> <deleted resolved issue>
>> I agree. So now I understand why the changes have given the master a special treatment. And I could also understand why WATCH_DOG was needed after I had seen the glassfish issue #8308. :-)
>> About the glassfish issue #8308, I have a question.
>> If the server which uses Shoal doesn't have a node agent which supports WATCH_DOG, the FAILURE event could be lost at this case, couldn't it?
> The FAILURE event is never sent because the instance is restarted in a
> shorter period of time than GMS heartbeat failure detection can detect
> the failure.
>> Then the new master ought to notify the old master's FAILURE. is it Right?
> Here is the dilema.
> If an instance fails and restarts, the Shoal system is detecting the
> RESTARTED instance independent of WATCHDOG.
> At the time one detects that an instance has RESTARTED, it seems
> confusing to send out the fact that
> a previous instantiation of the issue in the past had failed.
> Once an instance has restarted, one could easily confuse the FAILURE
> event with the restarted instance.
> The Shoal internals is differentiating between the two different
> instantiations by consulting the
> start time of the instance. (Each heartbeat has a START TIME that
> records the time the instance joined the group.)
> However, that is not typical Shoal protocol for users of Shoal API.
> Thus it is better to miss the FAILURE event and just log that the
> FAILURE event was missed.
> (as documented under glassfish issue 8308.)
> Here is the sequence that is occurring.
> Instance A (started at time XX)
> Instance A (fails at time YY)
> Instance A (restarted at time ZZ)
> Timeline
> ---------+---------------------------+-------------------------------+--------
> XX
> It is not ambiguous to send a FAILURE notification between times YY and ZZ.
> However, once one hits time ZZ, it is ambiguous whether the FAILURE
> applies to Instance A started at XX
> or instance A started at ZZ. Also, what benefit is it to know that
> Instance A started at XX failed if Instance A
> has restarted at time ZZ. The above occurs anytime that the time
> between ZZ- YY is less than amount
> of time GMS heartbeat failure detection needs to detect failure of an
> instance. As of this writing, only the Glassfish
> Nodeagent is known to restart an instance of a cluster in shorter period
> time than glassfish
> default GMS heartbeat failure detection time.
> Here are the log messages detecting that the FAILURE event was never
> sent due to fast restart.
> These log messages are recorded in MasterNode(typically the DAS) when it
> receives GMS heartbeat STARTING from Instance
> A started at time ZZ and the system realizes the heartbeat is from a
> different instance A than the
> last recorded one (that had started at time XX).
> [#|...|WARNING|sun-glassfish-comms-server1.5|ShoalLogger|...;
> Instance n2c1m4 was restarted at 4:13:19 PM PST on Feb 4, 2009.|#]
> [#|...|WARNING|sun-glassfish-comms-server1.5|ShoalLogger|...;
> Note that there was no Failure notification sent out for this instance
> that was
> previously started at 4:11:31 PM PST on Feb 4, 2009|#]
> Complete description at
> Hope this explains why sending late FAILURE notifications is only logged
> and that only
> WATCHDOG capability is able to send a timely FAILURE event for an
> instance that dies
> and is quicly restarted by an external agent (in glassfish's case the
> NodeAgent.)
> -Joe
>> Thanks in advance.
>> PS) Didn't you join the Javaone events with Shreedhar? Unfortunately, I couldn't join there this year. But I wish that I will attend next Javaone and meet you and many Shoal's users and devs next year!
>> --
>> Bongjae Chang
>> Subject: Re: [Shoal-Dev] About sailfin issue #484
>>> Bongjae,
>>> See my comments inline below.
>>> Bongjae Chang wrote:
>>>> Hi,
>>>> I have a question about sailfin issue #484 relating to
>>>> MasterNode#processMasterNodeQuery()'s changes.
>>>> I tried to test the master's failure.
>>>> This test is like sailfin issue #484.
>>>> i.g. the master dies and comes back up quickly.
>>>> It seems that the policy and behavior about the failed master has been
>>>> changed from sailfin issue #484.
>>>> The changes select a new master and notify a join notification about
>>>> the old master in only new master.
>>>> This result was not my expectaion because the old master didn't have a
>>>> failure state at other members.
>>> Please see the following glassfish issue concerning fast restart of a
>>> failed instance.
>>> To summarize, GMS heartbeat detection (default of 7.5 seconds in
>>> Glassfish) is not able to detect
>>> and report FAILURE event when the glassfish NodeAgent automatically
>>> restarts an instance in less than
>>> 7.5 seconds. The instance has truely failed regardless if it is reported
>>> by a GMS failure event.
>>> It is not possible to send out a GMS FAILURE event once the instance has
>>> already restarted.
>>> That is what is discussed in much detail in glassfish issue 8308 and the
>>> ability to augment GMS failure
>>> detection when an external agent is restarting failed instances faster
>>> than gms heartbeat detection.
>>> The restarted instance is missing all state that the previous Master
>>> instance did have. It was a bug in sailfin 484 that the failure went
>>> undetected.
>>> It was not a policy change but a bug fix.
>>> Here is how GMS failure detection works at a high level.
>>> - The MasterNode monitors all other instance heartbeats in a cluster for
>>> failure.
>>> - All other instances in the cluster monitor the MasterNode heartbeats
>>> to check if it failed.
>>> Once the MasterNode is killed and comes back up quickly, ALL other
>>> instances in the cluster
>>> (not just the master node) will see a MasterNodeQuery. ALL OTHER
>>> INSTANCES recognize the
>>> former master node has restarted and that there is a need to recalculate
>>> who is the new Master from the surviving cluster instances since the
>>> newly restarted former master is missing all state
>>> (which instances make up the cluster).
>>> Only the surviving instances of the cluster have been keeping that
>>> information and are quallified to be new Master.
>>> Whichever instance is made the new Master (based on an algorithm that
>>> all instances are applying to their list of instances making up the cluster)
>>> all instances will agree on new Master.
>>> Only the newly elected Master sends out the join notification of the
>>> restarted old Master instance. That was the fix that
>>> was checked in for sailfin 484. All other instances of the cluster will
>>> receive this join notification.
>>> I hope this explains the motivation behind the fix for sailfin 484.
>>> It was not intended to be a policy change.
>>> -Joe
>>>> I thought that the old master should keep master' role if the old
>>>> master came back up quickly before others were aware of the old
>>>> master's failure.
>>>> And the changes are only notifying the old master's join notification
>>>> in a new master.
>>>> Assume that A, B and C are members and A is the master.
>>>> When A dies and comes back quickly, B becomes to be a new master and B
>>>> receives A's join notification. Maybe C doesn't receive A's join
>>>> notification because A is not only failure member but also indoubt
>>>> member. I think that C's behavior is right.
>>>> Assume that A, B and C are members and A is the master again.
>>>> When B dies and comes back quickly, both A and C doesn't receive join
>>>> notifications because B is not indoubt member as well as failure
>>>> member. I think that this behavior is also right.
>>>> When the old master dies and rejoins the group quickly, the old master
>>>> perhaps discovers the group's master. But the group doesn't have the
>>>> master because the old master itself has been the group master. Then
>>>> the old master which rejoins the group will wait for discovery time.
>>>> During discovery time, maybe all members can't receive the group's
>>>> event adequately.
>>>> So is the new master selected in order to save discovery time instead
>>>> of the old master?
>>>> And should we give the old master's join notification special
>>>> treatment when the old master dies and comes back?
>>>> What do you think?
>>>> Thanks!
>>>> --
>>>> Bongjae Cha
