dev@shoal.java.net

Re: [Shoal-Dev] About HealthMonitor's cache

From: Bongjae Chang <carryel_at_korea.com>
Date: Thu, 9 Jul 2009 14:40:48 +0900

Thanks Joe,

While I merged your commit today into SHOAL_1_1_ABSTRACTING_TRANSPORT branch, I found that some java classes about the group leadership should be modified and committed again correctly.

1. GroupLeadershipNotificationSignalImpl.java should be moved into com.sun.enterprise.ee.cms.impl.common package. It is not com.sun.enterprise.ee.cms.impl.client package.

2. GroupLeadershipNotificationActionImpl.java, GroupLeadershipNotificationTest.java, GroupLeadershipNotificationSignalImpl.java and GroupLeadershipNotificationActionFactory.java should have the license header.

Shall I correct them if you don't mind?

--
Bongjae Chang

----- Original Message -----
From: "Joseph Fialli" <Joseph.Fialli_at_Sun.COM>
To: <dev_at_shoal.dev.java.net>
Sent: Thursday, July 09, 2009 12:36 AM
Subject: Re: [Shoal-Dev] About HealthMonitor's cache


> Bongjae Chang wrote:
>> Hi Joe,
>>
>> I understood your words and agree with you!
>>
> Bongjae,
>
> My commit today in HealthMontor.java addressed the issue that you raised
> that the FINE processCacheUpdate log
> message was coming out for DEAD entries. It will only come up for
> entries that
> processCacheUpdate actually operates on.
>
> -Joe
>> Thanks!
>> --
>> Bongjae Chang
>>
>>
>> ----- Original Message -----
>> From: "Joseph Fialli" <Joseph.Fialli_at_Sun.COM>
>> To: <dev_at_shoal.dev.java.net>
>> Sent: Tuesday, July 07, 2009 1:21 AM
>> Subject: Re: [Shoal-Dev] About HealthMonitor's cache
>>
>>
>>
>>> Bongjae Chang wrote:
>>>
>>>> Hi,
>>>> HealthMonitor stores the cache with members' states.
>>>> But if a member's state was stored once, the value would be never removed.
>>>> Assume that A was the group' member and now A is still failed.
>>>> Then, we can see the following FINE level's log continuously.
>>>>
>>> Bongjae,
>>>
>>> I did not design the entry to stay in the cache, but I have been taking
>>> advantage of it recently.
>>> I will mention the two instances that I am aware of that benefit from it
>>> staying in the cache.
>>>
>>> It is my first impression that it is preferable to leave the the DEAD
>>> state cache in the HealthMonitor cache due to
>>> the existence of method GroupHandle.getMemberState(). getMemberState()
>>> is a pull API provided to GMS client to poll on state of a member. If
>>> there is no
>>> entry for a member in the cache, then GMS would need to try to contact
>>> the instance.
>>> If an application requests the state of a member and it has recently
>>> died, it is best to remember that state.
>>> If the instance restarts, the state will get replaced in the cache.
>>>
>>> The method HealthMonitor.cleanAllCaches() would be the place to clear
>>> entry, but I would prefer not to.
>>> Retaining the state ensures that we do not report an instance failed
>>> twice. The DEAD instance is cleared
>>> from all other caches that we want it to be cleaned from when the
>>> instance is dead by calling the method
>>> cleanAllCaches().
>>>
>>> I propose to fix the event log message and processing code in
>>> processCacheUpdate to skip entries for
>>> DEAD instances (and other states that do not make sense to process in
>>> that method.)
>>> However, even the WATCHDOG api benefits from the entry remaining in the
>>> healthcache since this
>>> provides a mapping from instance name within a group to the jxta entry
>>> id. The existence of a dead entry
>>> prevents WATCHDOG mechanism from not reporting an instance failed twice.
>>> There does exist a possible
>>> race condition between GMS heartbeat failure detection reporting an
>>> instance has failed and NA reporting an
>>> instance has failed, the current implementation relies on healthmonitor
>>> cache entry as central location to maintain
>>> state of an instance and prevent double reporting that an instance is DEAD.
>>>
>>> -Joe
>>>
>>>
>>>> --
>>>> [#|2009-07-03T21:42:45.930+0900|FINE|Shoal|ShoalLogger|_ThreadID=30;_ThreadName=InDoubtPeerDetector
>>>> Thread for
>>>> Group:test;ClassName=HealthMonitor$InDoubtPeerDetector;MethodName=processCacheUpdate;|pro
>>>> cessCacheUpdate : A 's state is dead|#]
>>>> [#|2009-07-03T21:42:48.930+0900|FINE|Shoal|ShoalLogger|_ThreadID=30;_ThreadName=InDoubtPeerDetector
>>>> Thread for
>>>> Group:test;ClassName=HealthMonitor$InDoubtPeerDetector;MethodName=processCacheUpdate;|pro
>>>> cessCacheUpdate : A 's state is dead|#]
>>>> [#|2009-07-03T21:42:51.930+0900|FINE|Shoal|ShoalLogger|_ThreadID=30;_ThreadName=InDoubtPeerDetector
>>>> Thread for
>>>> Group:test;ClassName=HealthMonitor$InDoubtPeerDetector;MethodName=processCacheUpdate;|pro
>>>> cessCacheUpdate : A 's state is dead|#]
>>>> [#|2009-07-03T21:42:54.930+0900|FINE|Shoal|ShoalLogger|_ThreadID=30;_ThreadName=InDoubtPeerDetector
>>>> Thread for
>>>> Group:test;ClassName=HealthMonitor$InDoubtPeerDetector;MethodName=processCacheUpdate;|pro
>>>> cessCacheUpdate : A 's state is dead|#]
>>>> [#|2009-07-03T21:42:57.930+0900|FINE|Shoal|ShoalLogger|_ThreadID=30;_ThreadName=InDoubtPeerDetector
>>>> Thread for
>>>> Group:test;ClassName=HealthMonitor$InDoubtPeerDetector;MethodName=processCacheUpdate;|pro
>>>> cessCacheUpdate : A 's state is dead|#]
>>>> [#|2009-07-03T21:43:00.945+0900|FINE|Shoal|ShoalLogger|_ThreadID=30;_ThreadName=InDoubtPeerDetector
>>>> Thread for
>>>> Group:test;ClassName=HealthMonitor$InDoubtPeerDetector;MethodName=processCacheUpdate;|pro
>>>> cessCacheUpdate : A 's state is dead|#]
>>>> --
>>>> Is this expected for monitoring old member's state or members' history?
>>>> Please advice me.
>>>> Thanks.
>>>> --
>>>> Bongjae Chang
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe_at_shoal.dev.java.net
>>> For additional commands, e-mail: dev-help_at_shoal.dev.java.net
>>>
>>>
>>>
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe_at_shoal.dev.java.net
> For additional commands, e-mail: dev-help_at_shoal.dev.java.net
>
>
>
>