users@glassfish.java.net

Re: Problem with webstart client when one node in cluster is down

From: <glassfish_at_javadesktop.org>
Date: Mon, 02 Apr 2007 17:37:18 PDT

I did some more experiments with a stand alone client as you suggested as well as with the webstart client. The results were the same for both cases:

 - When I start the cluster fresh (all nodes down, click "start cluster" in web admin console) the clients will properly load balance.
 - If I take down the first node in the list the clients fail with a looping ConnectionException.
 - If I take down the second node in the list the clients are unaffected.
 - After I stop one node and start it again (it doesn't matter which node), the clients will no longer load balance. They always go to the first node in the list. Also, once in this situation I see extra log messages in the server log when the client starts up. They are:
[#|2007-04-02T20:10:39.932-0400|INFO|sun-appserver9.1|javax.enterprise.system.stream.out|_ThreadID=31;_ThreadName=p: thread-pool-1; w: 9;|
GroupInfoServiceBase(p: thread-pool-1; w: 9): .notifyObservers->:|#]

[#|2007-04-02T20:10:39.934-0400|INFO|sun-appserver9.1|javax.enterprise.system.stream.out|_ThreadID=31;_ThreadName=p: thread-pool-1; w: 9;|
GroupInfoServiceBase(p: thread-pool-1; w: 9): .notifyObservers<-:|#]


For all my tests the client is initially stopped and I try to run it once the desired cluster state has been reached. I have not been testing what happens to the client when nodes go up and down while it is running. Because of this the clients should always be using the static list to bootstrap.

There seem to be 2 issues here:
1. The client bootstrap doesn't seem to try anything but the first item in the list. If this item fails it tries it over and over again in a tight loop but never tries any of the other items.
2. The state of the cluster seems to be broken after a node goes down and up such that the client only sees the node that it bootstraps with.
[Message sent by forum member 'sarnoth' (sarnoth)]

http://forums.java.net/jive/thread.jspa?messageID=210969