dev@shoal.java.net

Re: [Shoal-Dev] When group leader failed, any member couldn't receive FailureRecovery notification

From: Bongjae Chang <carryel_at_korea.com>
Date: Thu, 13 Nov 2008 10:28:15 +0900

Hi Shreedhar.

Yes, I am using the latest cvs sources. :-)

--
Bongjae Chang


  ----- Original Message -----
  From: Shreedhar Ganapathy
  To: dev_at_shoal.dev.java.net
  Sent: Thursday, November 13, 2008 2:32 AM
  Subject: Re: [Shoal-Dev] When group leader failed, any member couldn't receive FailureRecovery notification


  Hi Bongjae
  This was recently fixed I thought.
  Are you using the latest cvs sources?

  Thanks
  Shreedhar

  Bongjae Chang wrote:
    Hi.
    I found another issue.
    When group leader failed, any member couldn't receive FailureRecovery notification.
    Of course, members added FailureRecoveryActionFactoryImpl and their callbacks to GMS.
    But if failure member was not group leader, other member received FailureRecovery notification successfully.

    Here are two logs.
    --------------------
    case 1) When failure member is group leader.

    2008. 11. 12 ¿ÀÈÄ 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
    Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
    1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

    2008. 11. 12 ¿ÀÈÄ 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
    Á¤º¸: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT
    2008. 11. 12 ¿ÀÈÄ 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
    Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
    1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
    2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

    2008. 11. 12 ¿ÀÈÄ 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
    Á¤º¸: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT
    2008. 11. 12 ¿ÀÈÄ 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
    Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
    1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
    2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

    2008. 11. 12 ¿ÀÈÄ 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
    Á¤º¸: Analyzing new membership snapshot received as part of event : ADD_EVENT
    2008. 11. 12 ¿ÀÈÄ 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
    Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
    1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
    2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

    2008. 11. 12 ¿ÀÈÄ 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
    Á¤º¸: Analyzing new membership snapshot received as part of event : IN_DOUBT_EVENT
    2008. 11. 12 ¿ÀÈÄ 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addInDoubtMemberSignals
    Á¤º¸: gms.failureSuspectedEventReceived
    2008. 11. 12 ¿ÀÈÄ 9:43:53 com.sun.enterprise.ee.cms.impl.common.Router notifyFailureSuspectedAction
    Á¤º¸: Sending FailureSuspectedSignals to registered Actions. Member:b6663a51-9b79-43e2-92dd-41899c907383...
    2008. 11. 12 ¿ÀÈÄ 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
    Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
    1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

    2008. 11. 12 ¿ÀÈÄ 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
    Á¤º¸: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT
    2008. 11. 12 ¿ÀÈÄ 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
    Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
    1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

    2008. 11. 12 ¿ÀÈÄ 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
    Á¤º¸: Analyzing new membership snapshot received as part of event : FAILURE_EVENT
    2008. 11. 12 ¿ÀÈÄ 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addFailureSignals
    Á¤º¸: The following member has failed: b6663a51-9b79-43e2-92dd-41899c907383



    case 2) When failure member is not group leader

    2008. 11. 12 ¿ÀÈÄ 9:40:03 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
    Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
    1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03

    2008. 11. 12 ¿ÀÈÄ 9:40:03 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
    Á¤º¸: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT
    2008. 11. 12 ¿ÀÈÄ 9:40:14 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
    Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
    1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
    2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103

    2008. 11. 12 ¿ÀÈÄ 9:40:14 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
    Á¤º¸: Analyzing new membership snapshot received as part of event : ADD_EVENT
    2008. 11. 12 ¿ÀÈÄ 9:40:43 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
    Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
    1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
    2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103

    2008. 11. 12 ¿ÀÈÄ 9:40:49 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
    Á¤º¸: Analyzing new membership snapshot received as part of event : IN_DOUBT_EVENT
    2008. 11. 12 ¿ÀÈÄ 9:41:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addInDoubtMemberSignals
    Á¤º¸: gms.failureSuspectedEventReceived
    2008. 11. 12 ¿ÀÈÄ 9:41:12 com.sun.enterprise.ee.cms.impl.common.Router notifyFailureSuspectedAction
    Á¤º¸: Sending FailureSuspectedSignals to registered Actions. Member:b77af0d3-581c-4392-89cf-6a06d736c90f...
    2008. 11. 12 ¿ÀÈÄ 9:41:29 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
    Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
    1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03

    2008. 11. 12 ¿ÀÈÄ 9:41:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
    Á¤º¸: Analyzing new membership snapshot received as part of event : FAILURE_EVENT
    2008. 11. 12 ¿ÀÈÄ 9:41:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addFailureSignals
    Á¤º¸: The following member has failed: b77af0d3-581c-4392-89cf-6a06d736c90f
    2008. 11. 12 ¿ÀÈÄ 9:42:19 com.sun.enterprise.ee.cms.impl.common.RecoveryTargetSelector setRecoverySelectionState
    Á¤º¸: Appointed Recovery Server:96438e75-740c-4613-af8d-6b2ab8ea4727:for failed member:b77af0d3-581c-4392-89cf-6a06d736c90f:for group:DemoGroup
    2008. 11. 12 ¿ÀÈÄ 9:42:19 com.sun.enterprise.ee.cms.impl.common.Router notifyFailureRecoveryAction
    Á¤º¸: Sending FailureRecoveryNotification to component service
    --------------------

    In case1(abnormal case),
    group leader failed -> IN_DOUBT_EVENT -> MASTER_CHANGE_EVENT(because new master was selected) -> FAILURE_EVENT

    In case2(normal case),
    member failed -> IN_DOUBT_EVENT -> FAILURE_EVENT

    For receiving FailureRecovery notification, recovery target should be resolved. Selection algorithm for recovery target uses previous members' view.

    Assume that "A" and "B" are member in the same group and "A" is group leader.

    [case1: "B"'s view histroy]
    ... --> (A, B) --> A failed -> B became to be new master with master change event -> (B)[previous view] -> failure event -> (B)[current view]

    [case2: "A"'s view history]
    ... --> (A, B)[previous view] --> B failed -> failure event -> (B)[current view]


    In other words,
    case1's previous view doesn't have "A"(failure member), so default algorithm(SimpleSelectionAlgorithm) can't find proper recovery target.
    case2's previous view has "B"(failure member), so default algorithm can select "A" for recovery target.
    (I assume that you already know SimpleSelectionAlgorithm)

    So I think that this issue has a concern in selection algorithm for recovery target.

    I think that thinking out another simple algorithm can be an example for resolving this issue.
    ex) always selecting first core member in live cache.

    Thanks.

    --
    Bongjae Chang