dev@shoal.java.net

notifying cluster view event locally is not thread safe

From: Bongjae Chang <carryel_at_korea.com>
Date: Wed, 12 Nov 2008 19:26:04 +0900

Hi.

I think that notifying cluster view event locally is not thread safe. I referred to a question similarly long ago. :-)

ClusterViewManager.notifyListeners() can be executed on multi-threads when many members join the same group concurrently.

Though there are no member's failures, you can see the following log.

------------------------------------
2008. 11. 12 ¿ÀÈÄ 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest initializeGMS
Á¤º¸: Initializing Shoal for member: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f group:TestGroup
2008. 11. 12 ¿ÀÈÄ 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest runSimpleSample
Á¤º¸: Registering for group event notifications
2008. 11. 12 ¿ÀÈÄ 5:44:00 com.sun.enterprise.shoal.jointest.SimpleJoinTest runSimpleSample
Á¤º¸: Joining Group TestGroup
2008. 11. 12 ¿ÀÈÄ 5:44:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group TestGroup (5d3280a2-a0c5-4ae2-8d41-d59b57400b8f) : Members in view for (before change analysis) are :
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03

2008. 11. 12 ¿ÀÈÄ 5:44:07 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT
2008. 11. 12 ¿ÀÈÄ 5:44:08 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group TestGroup (aeea918f-571b-463b-bfa6-55c536df0d11) : Members in view for (before change analysis) are :
(a)
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
3: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
4: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

2008. 11. 12 ¿ÀÈÄ 5:44:08 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 ¿ÀÈÄ 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group TestGroup (addb1dbe-06cf-43b8-8903-78605f29091f) : Members in view for (before change analysis) are :
(b)
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303

2008. 11. 12 ¿ÀÈÄ 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 ¿ÀÈÄ 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group TestGroup (fae1414d-702a-42fd-8c7d-6ffabe8b2e69) : Members in view for (before change analysis) are :
(c)
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
3: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

2008. 11. 12 ¿ÀÈÄ 5:44:17 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
Á¤º¸: Analyzing new membership snapshot received as part of event : ADD_EVENT
2008. 11. 12 ¿ÀÈÄ 5:44:20 com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
Á¤º¸: GMS View Change Received for group TestGroup (42b22147-7683-481f-a9f4-85ba5a2b847f) : Members in view for (before change analysis) are :
1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: 42b22147-7683-481f-a9f4-85ba5a2b847f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250334501FF701A644877A4B4C65068965F3403
3: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
4: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
5: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

...
------------------------------------

This log means that five members join "TestGroup"

1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
2: MemberId: 42b22147-7683-481f-a9f4-85ba5a2b847f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250334501FF701A644877A4B4C65068965F3403
3: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
4: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
5: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE, Address: urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203

And this log is printed in ViewWindow based on the viewQueue when new view is observed.

But above log message, you can see that (a), (b) and (c)'s order are strange.

Because there are no failures, I think that member's number should be increased gradually(or (a)'num <= (b)'s num <= (c)'s num).

The following code is ClusterViewManager's notifyListeners() method.
-----
void notifyListeners(final ClusterViewEvent event) {
    Log.log(...);
    for (ClusterViewEventListener elem : cvListeners) {
        elem.clusterViewEvent(event, getLocalView());
    }
}
-----

getLocalView() is thread safe with viewLock but ClusterViewEventListener's clusterViewEvent() is not thread safe.

The following code is GroupCommunicationProviderImpl's clusterViewEvent() method which implements ClusterViewEventListener interface.
-----
public void clusterViewEvent(final ClusterViewEvent clusterViewEvent, final ClusterView clusterView) {
    ...
    final EventPacket ePacket = new EventPakcet(clusterViewEvent.getEvent(), clusterViewEvent.getAdvertisement(), clusterView);
    final ArrayBlockingQueue<EventPacket> viewQueue = getGMSContext().getViewQueue();
    try {
        viewQueue.put(ePacket);
    } catch(InterruptedExcetion e) {
        ...
    }

}
-----

I think that local view's snapshot(getLocalView()'s return value) and viewQueue.put() should be atomic like this.
-----
void notifyListeners(final ClusterViewEvent event) {
    Log.log(...);
    for (ClusterViewEventListener elem : cvListeners) {
        synchronized( elem ) {
            elem.clusterViewEvent(event, getLocalView());
        }
    }
}

or

public synchronized void clusterViewEvent(final ClusterViewEvent clusterViewEvent, final ClusterView clusterView) {
    ...
    final EventPacket ePacket = new EventPakcet(clusterViewEvent.getEvent(), clusterViewEvent.getAdvertisement(), clusterView);
    final ArrayBlockingQueue<EventPacket> viewQueue = getGMSContext().getViewQueue();
    try {
        viewQueue.put(ePacket);
    } catch(InterruptedExcetion e) {
        ...
    }

}

(In my opinion, I think that the former is better because clusterViewEvent() can be implemented variously)
-----

In other words,
-------------------------------------------------------------------
getLocalView() --> local view's snapshot --> (hole) --> insert view queue
-------------------------------------------------------------------

As you can see above, before EventPacket is inserted into view queue, there is some hole. So we can remove the hole with synchronized block or individual lock object.
If the hole is removed, I think that ViewWindow can receive local view capture from queue correctly.

Thanks.

--
Bongjae Chang