dev@shoal.java.net

Sometimes FailureRecoverySignal is not notified

From: Bongjae Chang <bongjae.chang_at_gmail.com>
Date: Mon, 8 Mar 2010 15:07:57 +0900

Hi,

When I tested FailureRecovery with GroupLeadershipNotification, I found sometimes the signal was not notified.

As you know, an appropriate recoverer should be selected for FailureRecovery.

But sometimes the recoverer was null when I debugged because of wrong previousView in ViewWindow.java and ViewWindowImpl.java(branch).

The ViewWindow interface has following methods and ViewWindow.java(or ViewWindowImpl.java) implements those.

public interface ViewWindow {
    List<GMSMember> getPreviousView();
    List<GMSMember> getCurrentView();
    ...
}

But ViewWindow.java(or ViewWindowImpl.java) returns un-safe lists. See the following code.

In ViewWindow.java
---
public List<GMSMember> getPreviousView() {
    List<GMSMember> result = EMPTRY_GMS_MEMBER_LIST;
    synchronized(views) {
        final int INDEX = views.size() -2;
        if(INDEX >= 0) {
            // views.get(INDEX) is a List and can be shared unexpectedly.
            result = views.get(INDEX);
        }
    }
    return result;
}
...
private void addGroupLeadershipNotificationSignal( ... ) {
    ...
    signals.add( new GroupLeadershipNotificationSignalImpl( token, getPreviousView(), getCurrentView(), ...);
}
---

In GroupLeadershipNotificationSignalImpl.java
---
public void release() throws SignalReleaseException {
    if( previousView != null )
        previousView.clear();
    if( currentView != null )
        currentView.clear();
    ...

}
---

As you see above, sometimes the view(previous or current view) could be shared and cleared unexpectedly by other components.

So I attached proposed patches which copy and return the list safely. And I removed "List.clear()" code in GroupLeadershipNotificationSignalImpl.java because I think the logic is unnecessary.

When I applied the patch, I found it worked well,

Please review the attached patches.

Thanks.

Regards,
Bongjae Chang