dev@shoal.java.net

Re: [Shoal-Dev] When group leader failed, any member couldn't receive FailureRecovery notification

From: Shreedhar Ganapathy <Shreedhar.Ganapathy_at_Sun.COM>
Date: Wed, 12 Nov 2008 10:13:20 -0800

Can you file an issue for this along with text of your original email?

Shreedhar Ganapathy wrote:
> Looking through the later part of your email I agree with your
> assessment. I think this is indeed a case where the simple algo needs
> to be improved.
>
> Bongjae Chang wrote:
>> Hi.
>> I found another issue.
>> When group leader failed, any member couldn't receive FailureRecovery
>> notification.
>> Of course,members added FailureRecoveryActionFactoryImpl and their
>> callbacks to GMS.
>> Butif failure memberwas not group leader, other member received
>> FailureRecovery notification successfully.
>> Here are two logs.
>> --------------------
>> case 1) When failure member is group leader.
>> 2008. 11. 12 ¿ÀÈÄ 9:43:28
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
>> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
>> for (before change analysis) are :
>> 1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03
>> 2008. 11. 12 ¿ÀÈÄ 9:43:28
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
>> Á¤º¸: Analyzing new membership snapshot received as part of event :
>> MASTER_CHANGE_EVENT
>> 2008. 11. 12 ¿ÀÈÄ 9:43:28
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
>> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
>> for (before change analysis) are :
>> 1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
>> 2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03
>> 2008. 11. 12 ¿ÀÈÄ 9:43:28
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
>> Á¤º¸: Analyzing new membership snapshot received as part of event :
>> MASTER_CHANGE_EVENT
>> 2008. 11. 12 ¿ÀÈÄ 9:43:28
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
>> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
>> for (before change analysis) are :
>> 1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
>> 2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03
>> 2008. 11. 12 ¿ÀÈÄ 9:43:28
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
>> Á¤º¸: Analyzing new membership snapshot received as part of event :
>> ADD_EVENT
>> 2008. 11. 12 ¿ÀÈÄ 9:43:53
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
>> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
>> for (before change analysis) are :
>> 1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
>> 2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03
>> 2008. 11. 12 ¿ÀÈÄ 9:43:53
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
>> Á¤º¸: Analyzing new membership snapshot received as part of event :
>> *IN_DOUBT_EVENT*
>> 2008. 11. 12 ¿ÀÈÄ 9:43:53
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addInDoubtMemberSignals
>> Á¤º¸: gms.failureSuspectedEventReceived
>> 2008. 11. 12 ¿ÀÈÄ 9:43:53
>> com.sun.enterprise.ee.cms.impl.common.Router notifyFailureSuspectedAction
>> Á¤º¸: Sending FailureSuspectedSignals to registered Actions.
>> Member:b6663a51-9b79-43e2-92dd-41899c907383...
>> 2008. 11. 12 ¿ÀÈÄ 9:43:57
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
>> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
>> for (before change analysis) are :
>> 1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03
>> 2008. 11. 12 ¿ÀÈÄ 9:43:57
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
>> Á¤º¸: Analyzing new membership snapshot received as part of event :
>> *MASTER_CHANGE_EVENT*
>> 2008. 11. 12 ¿ÀÈÄ 9:43:57
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
>> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
>> for (before change analysis) are :
>> 1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03
>> 2008. 11. 12 ¿ÀÈÄ 9:43:57
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
>> Á¤º¸: Analyzing new membership snapshot received as part of event :
>> *FAILURE_EVENT*
>> 2008. 11. 12 ¿ÀÈÄ 9:43:57
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addFailureSignals
>> Á¤º¸: The following member has failed:
>> b6663a51-9b79-43e2-92dd-41899c907383
>> case 2) When failure member is not group leader
>> 2008. 11. 12 ¿ÀÈÄ 9:40:03
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
>> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
>> for (before change analysis) are :
>> 1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
>> 2008. 11. 12 ¿ÀÈÄ 9:40:03
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
>> Á¤º¸: Analyzing new membership snapshot received as part of event :
>> MASTER_CHANGE_EVENT
>> 2008. 11. 12 ¿ÀÈÄ 9:40:14
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
>> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
>> for (before change analysis) are :
>> 1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
>> 2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103
>> 2008. 11. 12 ¿ÀÈÄ 9:40:14
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
>> Á¤º¸: Analyzing new membership snapshot received as part of event :
>> ADD_EVENT
>> 2008. 11. 12 ¿ÀÈÄ 9:40:43
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
>> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
>> for (before change analysis) are :
>> 1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
>> 2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103
>> 2008. 11. 12 ¿ÀÈÄ 9:40:49
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
>> Á¤º¸: Analyzing new membership snapshot received as part of event :
>> *IN_DOUBT_EVENT*
>> 2008. 11. 12 ¿ÀÈÄ 9:41:07
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addInDoubtMemberSignals
>> Á¤º¸: gms.failureSuspectedEventReceived
>> 2008. 11. 12 ¿ÀÈÄ 9:41:12
>> com.sun.enterprise.ee.cms.impl.common.Router notifyFailureSuspectedAction
>> Á¤º¸: Sending FailureSuspectedSignals to registered Actions.
>> Member:b77af0d3-581c-4392-89cf-6a06d736c90f...
>> 2008. 11. 12 ¿ÀÈÄ 9:41:29
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
>> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
>> for (before change analysis) are :
>> 1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE,
>> Address:
>> urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
>> 2008. 11. 12 ¿ÀÈÄ 9:41:41
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
>> Á¤º¸: Analyzing new membership snapshot received as part of event :
>> *FAILURE_EVENT*
>> 2008. 11. 12 ¿ÀÈÄ 9:41:42
>> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addFailureSignals
>> Á¤º¸: The following member has failed:
>> b77af0d3-581c-4392-89cf-6a06d736c90f
>> *2008. 11. 12 ¿ÀÈÄ 9:42:19
>> com.sun.enterprise.ee.cms.impl.common.RecoveryTargetSelector
>> setRecoverySelectionState
>> Á¤º¸: Appointed Recovery
>> Server:96438e75-740c-4613-af8d-6b2ab8ea4727:for failed
>> member:b77af0d3-581c-4392-89cf-6a06d736c90f:for group:DemoGroup
>> 2008. 11. 12 ¿ÀÈÄ 9:42:19
>> com.sun.enterprise.ee.cms.impl.common.Router notifyFailureRecoveryAction
>> Á¤º¸: Sending FailureRecoveryNotification to component service*
>> --------------------
>> Incase1(abnormal case),
>> group leader failed -> IN_DOUBT_EVENT -> MASTER_CHANGE_EVENT(because
>> new masterwas selected)-> FAILURE_EVENT
>> In case2(normal case),
>> member failed -> IN_DOUBT_EVENT -> FAILURE_EVENT
>> For receiving FailureRecovery notification, recovery target should be
>> resolved.Selection algorithm for recovery target uses previous
>> members' view.
>> Assume that "A" and "B"are member in the same group and "A" is group
>> leader.
>> [case1: "B"'s view histroy]
>> ...--> *(A, B)* --> A failed -> B became to benew master with master
>> change event -> *(B)[previous view]*-> failure event-> *(B)[current
>> view]*
>> [case2: "A"'s view history]
>> ... --> *(A, B)[previous view]*--> B failed -> failure event ->
>> *(B)[current view]*
>> In other words,
>> case1's previous view doesn'thave "A"(failure member), so default
>> algorithm(SimpleSelectionAlgorithm) can't find proper recovery target.
>> case2's previous view has "B"(failure member), so default algorithm
>> can select "A" for recovery target.
>> (Iassume that you already knowSimpleSelectionAlgorithm)
>> So I think that this issue has a concern in selection algorithm for
>> recovery target.
>> I think that thinking outanother simple algorithm can be an
>> examplefor resolving this issue.
>> ex) always selecting first core member in live cache.
>> Thanks.
>> --
>> Bongjae Chang