Re: [Shoal-Dev] Stale views in master node

From: Joseph Fialli <Joseph.Fialli_at_Sun.COM>
Date: Tue, 05 Jan 2010 16:59:15 -0500

Cameron Rochester wrote:
> Joseph,
>
> Thanks for the reply - I appreciate your time and look forward to
> further comments.
>
> One thing to point out: It is quite possible that a peer has failed
> during startup. I am running a cluster with anywhere up to 20 peer so
> there are plenty of opportunities for nodes to die on startup.
>
Cameron,

Verified problem and have a fix. (probably commit tomorrow)
Attached diff of fix. (INFO log messages will be made FINE before commit)

Filed as following shoal issue:
https://shoal.dev.java.net/issues/show_bug.cgi?id=95

-Joe

> Cheers
> Cameron
>
> Joseph Fialli wrote:
>> Cameron,
>>
>> Thanks for reporting this issue that you have encountered with Shoal.
>>
>> I have some very preliminary comments inlined below to help explain
>> your questions.
>> Will follow up with after more detailed analysis in another email.
>>
>> -Joe
>>
>>
>> Cameron Rochester wrote:
>>> Hi all,
>>>
>>> I have been using Shoal for some time now and have discovered an
>>> issue that crops up every now and then.
>>>
>>> Basically, when doing a DSC update to all peers I am seeing the
>>> following exception:
>>>
>>> WARNING: ClusterManager.send : sending of message
>>> net.jxta.endpoint.Message_at_11882231(2){270} failed. Unable to create
>>> an OutputPipe for
>>> urn:jxta:uuid-59616261646162614A787461503250335FDDDB9470DA4390A3E692268159961303
>>> route = null
>>> java.io.IOException: Unable to create a messenger to
>>> jxta://uuid-59616261646162614A787461503250335FDDDB9470DA4390A3E692268159961303/PipeService/urn:jxta:uuid-63B5938B46F147609C1C998286EA5F3B6E0638B5DF604AEEAC09A3FAE829FBE804
>>>
>>> at
>>> net.jxta.impl.pipe.BlockingWireOutputPipe.checkMessenger(BlockingWireOutputPipe.java:238)
>>>
>>> at
>>> net.jxta.impl.pipe.BlockingWireOutputPipe.<init>(BlockingWireOutputPipe.java:154)
>>>
>>> at
>>> net.jxta.impl.pipe.BlockingWireOutputPipe.<init>(BlockingWireOutputPipe.java:135)
>>>
>>> at
>>> net.jxta.impl.pipe.PipeServiceImpl.createOutputPipe(PipeServiceImpl.java:503)
>>>
>>> at
>>> net.jxta.impl.pipe.PipeServiceImpl.createOutputPipe(PipeServiceImpl.java:435)
>>>
>>> at
>>> net.jxta.impl.pipe.PipeServiceInterface.createOutputPipe(PipeServiceInterface.java:170)
>>>
>>> at
>>> com.sun.enterprise.jxtamgmt.ClusterManager.send(ClusterManager.java:505)
>>>
>>> at
>>> com.sun.enterprise.ee.cms.impl.jxta.GroupCommunicationProviderImpl.sendMessage(GroupCommunicationProviderImpl.java:254)
>>>
>>> at
>>> com.sun.enterprise.ee.cms.impl.jxta.DistributedStateCacheImpl.sendMessage(DistributedStateCacheImpl.java:500)
>>>
>>> at
>>> com.sun.enterprise.ee.cms.impl.jxta.DistributedStateCacheImpl.addToRemoteCache(DistributedStateCacheImpl.java:234)
>>>
>>>
>>> This lead me to the HealthMonitor and ClusterViewManager and I found
>>> the following things:
>>>
>>> 1) The HealthMonitor does not seem to get a list of advertisements
>>> from the ClusterViewManager to monitor. As far as I can tell they
>>> are built up via heartbeat messages.
>> That is by design. The ClusterViewManager list is built by receiving
>> a ClusterViewEvent from the master.
>> This is a subtle point when beginning to look at Shoal code but a
>> very important one that I believe you may be overlooking.
>> The HealthMonitor responsibilities in Master are quite different than
>> HealthMonitor responsibilities in all other non-master members of
>> cluster.
>>
>> Only the master's healthmonitor makes decisions on which members are
>> ALIVE, SUSPECT or DEAD.
>> All other Healthmonitors for non-master members of cluster are
>> watching the MASTER to check if it is still ALIVE, otherwise,
>>
>> There is always a delay between when an instance fails and it is
>> detected by Shoal master.
>> (The default shoal configuration for heartbeat failure detection of a
>> member takes very roughly between 7-9 seconds.)
>> Failures like the one you report above can happen during that
>> period. (WARNING message is too strong for those cases.)
>> Start up all members of a GMS group and then kill one with a "kill
>> -9". If you do the kill during startup, good chance you
>> may see above failure and it could be explained. However, since that
>> is not the scenario you are reporting, more thought is
>> needed to explore this issue.
>>
>>> 2) Occasionally the master node can hold onto a stale advertisement.
>>> When a new client receives the list of advertisements from the
>>> master at start up I was seeing a node in the list (in STARTING
>>> state) that didn't exist.
>> As explained above, if a member fails during startup, you could
>> observe the reported failure. However, since your scenario
>> description does not describe such a failure,
>> more investigation is needed.
>>> 3) Once the master has a stale advertisement it never removes it
>>> (see point 1)
>> It is by design that an instance stays in healthmonitor with dead
>> state. Only when Master updates cluster view with a FAILURE_EVENT
>> should an instance
>> be removed from non-Master member cluster view. (Of course, all
>> non-Master members are monitors for master and will all act
>> independently to detect masters
>> failure and then all members will follow same algorithm to agree on
>> new Master for GMS group.)
>>> 4) This was then causing a problem (and long timeouts) when sending
>>> the DSC update as it does a unicast to each advertisement, including
>>> the failed one.
>>>
>>> I am not sure why the Master node has a stale reference, it only
>>> happens occasionally, and is very hard to track down.
>>>
>>> To get around this I propose another fix. Basically the
>>> HealthMonitor will compare the list of PeerIDs in it's cache, to the
>>> list of peers known by the ClusterViewManager. If there are peers in
>>> the view that are not in the HealthMonitor cache then I simply add
>>> them to the cache so the InDoubtPeerDetector will do it's thing.
>>>
>>> The patch is attached. Could someone please review and let me know
>>> if it makes sense? The main thing I am unsure about is the sequence
>>> ID. It doesn't seem to be used by the in doubt detection so I have
>>> just set it to 0.
>> I will evaluate your proposal and get back to you. Thanks again for
>> reporting this issue and your proposed solution.
>>
>>>
>>> Thanks for looking
>>> Cameron
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe_at_shoal.dev.java.net
>>> For additional commands, e-mail: dev-help_at_shoal.dev.java.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe_at_shoal.dev.java.net
>> For additional commands, e-mail: dev-help_at_shoal.dev.java.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe_at_shoal.dev.java.net
> For additional commands, e-mail: dev-help_at_shoal.dev.java.net
>

video/x-dv attachment: shoal.dif