dev@shoal.java.net

Re: [Shoal-Dev] notifying cluster view event locally is not thread safe

From: Shreedhar Ganapathy <Shreedhar.Ganapathy_at_Sun.COM>
Date: Wed, 12 Nov 2008 10:12:52 -0800

Hi Bongjae
Can you file an issue for this ?
Add these details in there. Joe can review your changes and perhaps
consider checking in. We are high resistance for the final sprint on
Sailfin. This issue is not seen there though.

Thanks
Shreedhar

Bongjae Chang wrote:
> Hi.
> I think that notifying cluster view event locally is not thread safe.
> I referred to a question similarly long ago. :-)
> ClusterViewManager.notifyListeners() can be executedon multi-threads
> whenmany members join the same group concurrently.
> Though there are no member's failures, you can see the following log.
> ------------------------------------
> 2008. 11. 12 ¿ÀÈÄ 5:44:00
> com.sun.enterprise.shoal.jointest.SimpleJoinTest initializeGMS
> Á¤º¸: Initializing Shoal for member:
> 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f group:TestGroup
> 2008. 11. 12 ¿ÀÈÄ 5:44:00
> com.sun.enterprise.shoal.jointest.SimpleJoinTest runSimpleSample
> Á¤º¸: Registering for group event notifications
> 2008. 11. 12 ¿ÀÈÄ 5:44:00
> com.sun.enterprise.shoal.jointest.SimpleJoinTest runSimpleSample
> Á¤º¸: Joining Group TestGroup
> 2008. 11. 12 ¿ÀÈÄ 5:44:07
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group TestGroup
> (5d3280a2-a0c5-4ae2-8d41-d59b57400b8f) : Members in view for (before
> change analysis) are :
> 1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
> 2008. 11. 12 ¿ÀÈÄ 5:44:07
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> MASTER_CHANGE_EVENT
> 2008. 11. 12 ¿ÀÈÄ 5:44:08
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group TestGroup
> (aeea918f-571b-463b-bfa6-55c536df0d11) : Members in view for (before
> change analysis) are :
> *(a)*
> *1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
> 2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
> 3: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
> 4: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203*
> 2008. 11. 12 ¿ÀÈÄ 5:44:08
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> ADD_EVENT
> 2008. 11. 12 ¿ÀÈÄ 5:44:17
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group TestGroup
> (addb1dbe-06cf-43b8-8903-78605f29091f) : Members in view for (before
> change analysis) are :
> *(b)*
> *1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
> 2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303*
> 2008. 11. 12 ¿ÀÈÄ 5:44:17
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> ADD_EVENT
> 2008. 11. 12 ¿ÀÈÄ 5:44:17
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group TestGroup
> (fae1414d-702a-42fd-8c7d-6ffabe8b2e69) : Members in view for (before
> change analysis) are :
> *(c)*
> *1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
> 2: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
> 3: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203*
> 2008. 11. 12 ¿ÀÈÄ 5:44:17
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow newViewObserved
> Á¤º¸: Analyzing new membership snapshot received as part of event :
> ADD_EVENT
> 2008. 11. 12 ¿ÀÈÄ 5:44:20
> com.sun.enterprise.ee.cms.impl.jxta.ViewWindow getMemberTokens
> Á¤º¸: GMS View Change Received for group TestGroup
> (42b22147-7683-481f-a9f4-85ba5a2b847f) : Members in view for (before
> change analysis) are :
> 1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
> 2: MemberId: 42b22147-7683-481f-a9f4-85ba5a2b847f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250334501FF701A644877A4B4C65068965F3403
> 3: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
> 4: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
> 5: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203
> ...
> ------------------------------------
> This logmeans thatfive members join "TestGroup"
> 1: MemberId: 5d3280a2-a0c5-4ae2-8d41-d59b57400b8f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033090183254F6D47E7B235BC8D656194FA03
> 2: MemberId: 42b22147-7683-481f-a9f4-85ba5a2b847f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250334501FF701A644877A4B4C65068965F3403
> 3: MemberId: addb1dbe-06cf-43b8-8903-78605f29091f, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A787461503250336C047E2077544A5692C1EA21407A886303
> 4: MemberId: aeea918f-571b-463b-bfa6-55c536df0d11, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033DBAE9788614944F8A40ED352C8E7A03B03
> 5: MemberId: fae1414d-702a-42fd-8c7d-6ffabe8b2e69, MemberType: CORE,
> Address:
> urn:jxta:uuid-59616261646162614A78746150325033EF69FCF215DE43038FD0C3AA0535A08203
> Andthis log is printed in ViewWindow based on the viewQueue when new
> view is observed.
> But above log message, you can see that (a), (b) and (c)'s order are
> strange.
> Because thereare no failures, I think that member's number should be
> increased gradually(or(a)'num <= (b)'s num <= (c)'s num).
> The following code is ClusterViewManager's notifyListeners() method.
> -----
> void notifyListeners(final ClusterViewEvent event) {
> Log.log(...);
> for (ClusterViewEventListener elem : cvListeners) {
> elem.clusterViewEvent(event, getLocalView());
> }
> }
> -----
> getLocalView() is thread safe with viewLock but
> ClusterViewEventListener's clusterViewEvent() is not thread safe.
> The following code is GroupCommunicationProviderImpl's
> clusterViewEvent() method which implements ClusterViewEventListener
> interface.
> -----
> public void clusterViewEvent(final ClusterViewEvent clusterViewEvent,
> final ClusterView clusterView) {
> ...
> final EventPacket ePacket = new
> EventPakcet(clusterViewEvent.getEvent(),
> clusterViewEvent.getAdvertisement(), clusterView);
> final ArrayBlockingQueue<EventPacket> viewQueue =
> getGMSContext().getViewQueue();
> try {
> viewQueue.put(ePacket);
> } catch(InterruptedExcetion e) {
> ...
> }
>
> }
> -----
> I think that local view's snapshot(getLocalView()'s return value) and
> viewQueue.put() should be atomic like this.
> -----
> void notifyListeners(final ClusterViewEvent event) {
> Log.log(...);
> for (ClusterViewEventListener elem : cvListeners) {
> *synchronized( elem ) {*
> elem.clusterViewEvent(event, getLocalView());
> *}*
> }
> }
> or
> public *synchronized* void clusterViewEvent(final ClusterViewEvent
> clusterViewEvent, final ClusterView clusterView) {
> ...
> final EventPacket ePacket = new
> EventPakcet(clusterViewEvent.getEvent(),
> clusterViewEvent.getAdvertisement(), clusterView);
> final ArrayBlockingQueue<EventPacket> viewQueue =
> getGMSContext().getViewQueue();
> try {
> viewQueue.put(ePacket);
> } catch(InterruptedExcetion e) {
> ...
> }
>
> }
> (In my opinion,I think that the former is better because
> clusterViewEvent() can be implemented variously)
> -----
> In other words,
> -------------------------------------------------------------------
> getLocalView() --> local view's snapshot --> (hole) --> insert view queue
> -------------------------------------------------------------------
> As you can see above, before EventPacket is inserted into view queue,
> there is some hole. So we can remove the hole with synchronized block
> or individual lock object.
> Ifthe hole is removed, I think that ViewWindow can receive local view
> capture from queue correctly.
> Thanks.
> --
> Bongjae Chang