Re: [Shoal-Dev] About HealthMonitor's cache

From: Shreedhar Ganapathy <Shreedhar.Ganapathy_at_Sun.COM>
Date: Thu, 09 Jul 2009 00:06:03 -0700

Thanks a lot Bongjae.

Bongjae Chang wrote:
> Hi Shreedhar,
>
> I fixed them.
>
> GroupLeadershipNotificationSignalImpl.java already had "package com.sun.enterprise.ee.cms.impl.common", so any callers had no effect.
>
> Thanks!
> --
> Bongjae Chang
>
>
> ----- Original Message -----
> From: "Shreedhar Ganapathy" <Shreedhar.Ganapathy_at_Sun.COM>
> To: <dev_at_shoal.dev.java.net>
> Sent: Thursday, July 09, 2009 3:17 PM
> Subject: Re: [Shoal-Dev] About HealthMonitor's cache
>
>
>
>> Hi Bongjae
>>
>> Bongjae Chang wrote:
>>
>>> Thanks Joe,
>>>
>>> While I merged your commit today into SHOAL_1_1_ABSTRACTING_TRANSPORT branch, I found that some java classes about the group leadership should be modified and committed again correctly.
>>>
>>> 1. GroupLeadershipNotificationSignalImpl.java should be moved into com.sun.enterprise.ee.cms.impl.common package. It is not com.sun.enterprise.ee.cms.impl.client package.
>>>
>>>
>> You are right. The Signal Impl classes are in common and this should go
>> there as well. Could you do the needful while also changing any callers
>> as well?
>>
>>> 2. GroupLeadershipNotificationActionImpl.java, GroupLeadershipNotificationTest.java, GroupLeadershipNotificationSignalImpl.java and GroupLeadershipNotificationActionFactory.java should have the license header.
>>>
>>>
>> Thanks for that as well.
>>
>>> Shall I correct them if you don't mind?
>>>
>>>
>> Please go ahead.
>>
>>> --
>>> Bongjae Chang
>>>
>>> ----- Original Message -----
>>> From: "Joseph Fialli" <Joseph.Fialli_at_Sun.COM>
>>> To: <dev_at_shoal.dev.java.net>
>>> Sent: Thursday, July 09, 2009 12:36 AM
>>> Subject: Re: [Shoal-Dev] About HealthMonitor's cache
>>>
>>>
>>>
>>>
>>>> Bongjae Chang wrote:
>>>>
>>>>
>>>>> Hi Joe,
>>>>>
>>>>> I understood your words and agree with you!
>>>>>
>>>>>
>>>>>
>>>> Bongjae,
>>>>
>>>> My commit today in HealthMontor.java addressed the issue that you raised
>>>> that the FINE processCacheUpdate log
>>>> message was coming out for DEAD entries. It will only come up for
>>>> entries that
>>>> processCacheUpdate actually operates on.
>>>>
>>>> -Joe
>>>>
>>>>
>>>>> Thanks!
>>>>> --
>>>>> Bongjae Chang
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Joseph Fialli" <Joseph.Fialli_at_Sun.COM>
>>>>> To: <dev_at_shoal.dev.java.net>
>>>>> Sent: Tuesday, July 07, 2009 1:21 AM
>>>>> Subject: Re: [Shoal-Dev] About HealthMonitor's cache
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Bongjae Chang wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>> HealthMonitor stores the cache with members' states.
>>>>>>> But if a member's state was stored once, the value would be never removed.
>>>>>>> Assume that A was the group' member and now A is still failed.
>>>>>>> Then, we can see the following FINE level's log continuously.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Bongjae,
>>>>>>
>>>>>> I did not design the entry to stay in the cache, but I have been taking
>>>>>> advantage of it recently.
>>>>>> I will mention the two instances that I am aware of that benefit from it
>>>>>> staying in the cache.
>>>>>>
>>>>>> It is my first impression that it is preferable to leave the the DEAD
>>>>>> state cache in the HealthMonitor cache due to
>>>>>> the existence of method GroupHandle.getMemberState(). getMemberState()
>>>>>> is a pull API provided to GMS client to poll on state of a member. If
>>>>>> there is no
>>>>>> entry for a member in the cache, then GMS would need to try to contact
>>>>>> the instance.
>>>>>> If an application requests the state of a member and it has recently
>>>>>> died, it is best to remember that state.
>>>>>> If the instance restarts, the state will get replaced in the cache.
>>>>>>
>>>>>> The method HealthMonitor.cleanAllCaches() would be the place to clear
>>>>>> entry, but I would prefer not to.
>>>>>> Retaining the state ensures that we do not report an instance failed
>>>>>> twice. The DEAD instance is cleared
>>>>>> from all other caches that we want it to be cleaned from when the
>>>>>> instance is dead by calling the method
>>>>>> cleanAllCaches().
>>>>>>
>>>>>> I propose to fix the event log message and processing code in
>>>>>> processCacheUpdate to skip entries for
>>>>>> DEAD instances (and other states that do not make sense to process in
>>>>>> that method.)
>>>>>> However, even the WATCHDOG api benefits from the entry remaining in the
>>>>>> healthcache since this
>>>>>> provides a mapping from instance name within a group to the jxta entry
>>>>>> id. The existence of a dead entry
>>>>>> prevents WATCHDOG mechanism from not reporting an instance failed twice.
>>>>>> There does exist a possible
>>>>>> race condition between GMS heartbeat failure detection reporting an
>>>>>> instance has failed and NA reporting an
>>>>>> instance has failed, the current implementation relies on healthmonitor
>>>>>> cache entry as central location to maintain
>>>>>> state of an instance and prevent double reporting that an instance is DEAD.
>>>>>>
>>>>>> -Joe
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> --
>>>>>>> [#|2009-07-03T21:42:45.930+0900|FINE|Shoal|ShoalLogger|_ThreadID=30;_ThreadName=InDoubtPeerDetector
>>>>>>> Thread for
>>>>>>> Group:test;ClassName=HealthMonitor$InDoubtPeerDetector;MethodName=processCacheUpdate;|pro
>>>>>>> cessCacheUpdate : A 's state is dead|#]
>>>>>>> [#|2009-07-03T21:42:48.930+0900|FINE|Shoal|ShoalLogger|_ThreadID=30;_ThreadName=InDoubtPeerDetector
>>>>>>> Thread for
>>>>>>> Group:test;ClassName=HealthMonitor$InDoubtPeerDetector;MethodName=processCacheUpdate;|pro
>>>>>>> cessCacheUpdate : A 's state is dead|#]
>>>>>>> [#|2009-07-03T21:42:51.930+0900|FINE|Shoal|ShoalLogger|_ThreadID=30;_ThreadName=InDoubtPeerDetector
>>>>>>> Thread for
>>>>>>> Group:test;ClassName=HealthMonitor$InDoubtPeerDetector;MethodName=processCacheUpdate;|pro
>>>>>>> cessCacheUpdate : A 's state is dead|#]
>>>>>>> [#|2009-07-03T21:42:54.930+0900|FINE|Shoal|ShoalLogger|_ThreadID=30;_ThreadName=InDoubtPeerDetector
>>>>>>> Thread for
>>>>>>> Group:test;ClassName=HealthMonitor$InDoubtPeerDetector;MethodName=processCacheUpdate;|pro
>>>>>>> cessCacheUpdate : A 's state is dead|#]
>>>>>>> [#|2009-07-03T21:42:57.930+0900|FINE|Shoal|ShoalLogger|_ThreadID=30;_ThreadName=InDoubtPeerDetector
>>>>>>> Thread for
>>>>>>> Group:test;ClassName=HealthMonitor$InDoubtPeerDetector;MethodName=processCacheUpdate;|pro
>>>>>>> cessCacheUpdate : A 's state is dead|#]
>>>>>>> [#|2009-07-03T21:43:00.945+0900|FINE|Shoal|ShoalLogger|_ThreadID=30;_ThreadName=InDoubtPeerDetector
>>>>>>> Thread for
>>>>>>> Group:test;ClassName=HealthMonitor$InDoubtPeerDetector;MethodName=processCacheUpdate;|pro
>>>>>>> cessCacheUpdate : A 's state is dead|#]
>>>>>>> --
>>>>>>> Is this expected for monitoring old member's state or members' history?
>>>>>>> Please advice me.
>>>>>>> Thanks.
>>>>>>> --
>>>>>>> Bongjae Chang
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe_at_shoal.dev.java.net
>>>>>> For additional commands, e-mail: dev-help_at_shoal.dev.java.net
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe_at_shoal.dev.java.net
>>>> For additional commands, e-mail: dev-help_at_shoal.dev.java.net
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe_at_shoal.dev.java.net
>> For additional commands, e-mail: dev-help_at_shoal.dev.java.net
>>
>>
>>
>>
>>