dev@shoal.java.net

Re: [Shoal-Dev] When group leader failed, any member couldn't receive FailureRecovery notification

From: Shreedhar Ganapathy <Shreedhar.Ganapathy_at_Sun.COM>
Date: Wed, 12 Nov 2008 09:32:14 -0800

Hi Bongjae
This was recently fixed I thought.
Are you using the latest cvs sources?

Thanks
Shreedhar

Bongjae Chang wrote:
> Hi.
> I found another issue.
> When group leader failed, any member couldn't receive FailureRecovery
> notification.
> Of course,members added FailureRecoveryActionFactoryImpl and their
> callbacks to GMS.
> Butif failure memberwas not group leader, other member received
> FailureRecovery notification successfully.
> Here are two logs.
> --------------------
> case 1) When failure member is group leader.
> 2008. 11. 12 ¿ÀÈÄ 9:43:28
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
> for (before change analysis) are :
> 1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03
> 2008. 11. 12 ¿ÀÈÄ 9:43:28
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> MASTER_CHANGE_EVENT
> 2008. 11. 12 ¿ÀÈÄ 9:43:28
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
> for (before change analysis) are :
> 1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
> 2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03
> 2008. 11. 12 ¿ÀÈÄ 9:43:28
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> MASTER_CHANGE_EVENT
> 2008. 11. 12 ¿ÀÈÄ 9:43:28
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
> for (before change analysis) are :
> 1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
> 2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03
> 2008. 11. 12 ¿ÀÈÄ 9:43:28
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> ADD_EVENT
> 2008. 11. 12 ¿ÀÈÄ 9:43:53
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
> for (before change analysis) are :
> 1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
> 2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03
> 2008. 11. 12 ¿ÀÈÄ 9:43:53
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> *IN_DOUBT_EVENT*
> 2008. 11. 12 ¿ÀÈÄ 9:43:53
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addInDoubtMemberSignals
> Á¤º¸: gms.failureSuspectedEventReceived
> 2008. 11. 12 ¿ÀÈÄ 9:43:53 com.sun.enterprise.ee.cms.impl.common.Router
> notifyFailureSuspectedAction
> Á¤º¸: Sending FailureSuspectedSignals to registered Actions.
> Member:b6663a51-9b79-43e2-92dd-41899c907383...
> 2008. 11. 12 ¿ÀÈÄ 9:43:57
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
> for (before change analysis) are :
> 1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03
> 2008. 11. 12 ¿ÀÈÄ 9:43:57
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> *MASTER_CHANGE_EVENT*
> 2008. 11. 12 ¿ÀÈÄ 9:43:57
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
> for (before change analysis) are :
> 1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03
> 2008. 11. 12 ¿ÀÈÄ 9:43:57
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> *FAILURE_EVENT*
> 2008. 11. 12 ¿ÀÈÄ 9:43:57
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addFailureSignals
> Á¤º¸: The following member has failed:
> b6663a51-9b79-43e2-92dd-41899c907383
> case 2) When failure member is not group leader
> 2008. 11. 12 ¿ÀÈÄ 9:40:03
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
> for (before change analysis) are :
> 1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
> 2008. 11. 12 ¿ÀÈÄ 9:40:03
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> MASTER_CHANGE_EVENT
> 2008. 11. 12 ¿ÀÈÄ 9:40:14
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
> for (before change analysis) are :
> 1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
> 2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103
> 2008. 11. 12 ¿ÀÈÄ 9:40:14
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> ADD_EVENT
> 2008. 11. 12 ¿ÀÈÄ 9:40:43
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
> for (before change analysis) are :
> 1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
> 2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103
> 2008. 11. 12 ¿ÀÈÄ 9:40:49
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> *IN_DOUBT_EVENT*
> 2008. 11. 12 ¿ÀÈÄ 9:41:07
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addInDoubtMemberSignals
> Á¤º¸: gms.failureSuspectedEventReceived
> 2008. 11. 12 ¿ÀÈÄ 9:41:12 com.sun.enterprise.ee.cms.impl.common.Router
> notifyFailureSuspectedAction
> Á¤º¸: Sending FailureSuspectedSignals to registered Actions.
> Member:b77af0d3-581c-4392-89cf-6a06d736c90f...
> 2008. 11. 12 ¿ÀÈÄ 9:41:29
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group DemoGroup : Members in view
> for (before change analysis) are :
> 1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
> 2008. 11. 12 ¿ÀÈÄ 9:41:41
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> *FAILURE_EVENT*
> 2008. 11. 12 ¿ÀÈÄ 9:41:42
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addFailureSignals
> Á¤º¸: The following member has failed:
> b77af0d3-581c-4392-89cf-6a06d736c90f
> *2008. 11. 12 ¿ÀÈÄ 9:42:19
> com.sun.enterprise.ee.cms.impl.common.RecoveryTargetSelector
> setRecoverySelectionState
> Á¤º¸: Appointed Recovery
> Server:96438e75-740c-4613-af8d-6b2ab8ea4727:for failed
> member:b77af0d3-581c-4392-89cf-6a06d736c90f:for group:DemoGroup
> 2008. 11. 12 ¿ÀÈÄ 9:42:19 com.sun.enterprise.ee.cms.impl.common.Router
> notifyFailureRecoveryAction
> Á¤º¸: Sending FailureRecoveryNotification to component service*
> --------------------
> Incase1(abnormal case),
> group leader failed -> IN_DOUBT_EVENT -> MASTER_CHANGE_EVENT(because
> new masterwas selected)-> FAILURE_EVENT
> In case2(normal case),
> member failed -> IN_DOUBT_EVENT -> FAILURE_EVENT
> For receiving FailureRecovery notification, recovery target should be
> resolved.Selection algorithm for recovery target uses previous
> members' view.
> Assume that "A" and "B"are member in the same group and "A" is group
> leader.
> [case1: "B"'s view histroy]
> ...--> *(A, B)* --> A failed -> B became to benew master with master
> change event -> *(B)[previous view]*-> failure event-> *(B)[current view]*
> [case2: "A"'s view history]
> ... --> *(A, B)[previous view]*--> B failed -> failure event ->
> *(B)[current view]*
> In other words,
> case1's previous view doesn'thave "A"(failure member), so default
> algorithm(SimpleSelectionAlgorithm) can't find proper recovery target.
> case2's previous view has "B"(failure member), so default algorithm
> can select "A" for recovery target.
> (Iassume that you already knowSimpleSelectionAlgorithm)
> So I think that this issue has a concern in selection algorithm for
> recovery target.
> I think that thinking outanother simple algorithm can be an examplefor
> resolving this issue.
> ex) always selecting first core member in live cache.
> Thanks.
> --
> Bongjae Chang