dev@shoal.java.net

When group leader failed, any member couldn't receive FailureRecovery notification

From: Bongjae Chang <carryel_at_korea.com>
Date: Thu, 13 Nov 2008 01:56:37 +0900

Hi.
I found another issue.
When group leader failed, any member couldn't receive FailureRecovery notification.
Of course, members added FailureRecoveryActionFactoryImpl and their callbacks to GMS.
But if failure member was not group leader, other member received FailureRecovery notification successfully.

Here are two logs.
--------------------
case 1) When failure member is group leader.

2008. 11. 12 ¿ÀÈÄ 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 ¿ÀÈÄ 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT
2008. 11. 12 ¿ÀÈÄ 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 ¿ÀÈÄ 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT
2008. 11. 12 ¿ÀÈÄ 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 ¿ÀÈÄ 9:43:28 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 ¿ÀÈÄ 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
1: MemberId: b6663a51-9b79-43e2-92dd-41899c907383, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250331DA08A66D0554F138E75E74AA363FC9E03
2: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 ¿ÀÈÄ 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : IN_DOUBT_EVENT
2008. 11. 12 ¿ÀÈÄ 9:43:53 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addInDoubtMemberSignals
Á¤º¸: gms.failureSuspectedEventReceived
2008. 11. 12 ¿ÀÈÄ 9:43:53 com.sun.enterprise.ee.cms.impl.common.Router notifyFailureSuspectedAction
Á¤º¸: Sending FailureSuspectedSignals to registered Actions. Member:b6663a51-9b79-43e2-92dd-41899c907383...
2008. 11. 12 ¿ÀÈÄ 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 ¿ÀÈÄ 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT
2008. 11. 12 ¿ÀÈÄ 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
1: MemberId: dd4897f5-2383-420e-8d3e-87f77407da41, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250332E9EB1D0D35742638E5B9CF78B8253EE03

2008. 11. 12 ¿ÀÈÄ 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : FAILURE_EVENT
2008. 11. 12 ¿ÀÈÄ 9:43:57 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addFailureSignals
Á¤º¸: The following member has failed: b6663a51-9b79-43e2-92dd-41899c907383



case 2) When failure member is not group leader

2008. 11. 12 ¿ÀÈÄ 9:40:03 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03

2008. 11. 12 ¿ÀÈÄ 9:40:03 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT
2008. 11. 12 ¿ÀÈÄ 9:40:14 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103

2008. 11. 12 ¿ÀÈÄ 9:40:14 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 ¿ÀÈÄ 9:40:43 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03
2: MemberId: b77af0d3-581c-4392-89cf-6a06d736c90f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EBEBAC9321A742D0B319D3F89446E0B103

2008. 11. 12 ¿ÀÈÄ 9:40:49 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : IN_DOUBT_EVENT
2008. 11. 12 ¿ÀÈÄ 9:41:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addInDoubtMemberSignals
Á¤º¸: gms.failureSuspectedEventReceived
2008. 11. 12 ¿ÀÈÄ 9:41:12 com.sun.enterprise.ee.cms.impl.common.Router notifyFailureSuspectedAction
Á¤º¸: Sending FailureSuspectedSignals to registered Actions. Member:b77af0d3-581c-4392-89cf-6a06d736c90f...
2008. 11. 12 ¿ÀÈÄ 9:41:29 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group DemoGroup : Members in view for (before change analysis) are :
1: MemberId: 96438e75-740c-4613-af8d-6b2ab8ea4727, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033376CC0C6DAB74C2BA6FAF9C6648D77BC03

2008. 11. 12 ¿ÀÈÄ 9:41:41 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : FAILURE_EVENT
2008. 11. 12 ¿ÀÈÄ 9:41:42 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow addFailureSignals
Á¤º¸: The following member has failed: b77af0d3-581c-4392-89cf-6a06d736c90f
2008. 11. 12 ¿ÀÈÄ 9:42:19 com.sun.enterprise.ee.cms.impl.common.RecoveryTargetSelector setRecoverySelectionState
Á¤º¸: Appointed Recovery Server:96438e75-740c-4613-af8d-6b2ab8ea4727:for failed member:b77af0d3-581c-4392-89cf-6a06d736c90f:for group:DemoGroup
2008. 11. 12 ¿ÀÈÄ 9:42:19 com.sun.enterprise.ee.cms.impl.common.Router notifyFailureRecoveryAction
Á¤º¸: Sending FailureRecoveryNotification to component service
--------------------

In case1(abnormal case),
group leader failed -> IN_DOUBT_EVENT -> MASTER_CHANGE_EVENT(because new master was selected) -> FAILURE_EVENT

In case2(normal case),
member failed -> IN_DOUBT_EVENT -> FAILURE_EVENT

For receiving FailureRecovery notification, recovery target should be resolved. Selection algorithm for recovery target uses previous members' view.

Assume that "A" and "B" are member in the same group and "A" is group leader.

[case1: "B"'s view histroy]
... --> (A, B) --> A failed -> B became to be new master with master change event -> (B)[previous view] -> failure event -> (B)[current view]

[case2: "A"'s view history]
... --> (A, B)[previous view] --> B failed -> failure event -> (B)[current view]


In other words,
case1's previous view doesn't have "A"(failure member), so default algorithm(SimpleSelectionAlgorithm) can't find proper recovery target.
case2's previous view has "B"(failure member), so default algorithm can select "A" for recovery target.
(I assume that you already know SimpleSelectionAlgorithm)

So I think that this issue has a concern in selection algorithm for recovery target.

I think that thinking out another simple algorithm can be an example for resolving this issue.
ex) always selecting first core member in live cache.

Thanks.

--
Bongjae Chang