users@glassfish.java.net

After some time running, cluster becomes non-responsive

From: <glassfish_at_javadesktop.org>
Date: Wed, 02 Jan 2008 21:02:58 PST

I have a 3 machine glassfish cluster load balanced by SJSWS 7.0. If I reboot all machines and start up the cluster, everything is fine. But after some time, the cluster becomes non-responsive with all instances returning HTTP 403 error codes. Executing a "asadmin stop-cluster cluster-name" command takes ridiculously longer than usual but eventually completes. The biggest problem is that after stopping the cluster, trying to restart it with "asadmin start-cluster cluster-name" fails with this error:

[root_at_glassfish1 ~]# asadmin start-cluster cluster-name
Operation 'startCluster' failed in 'clusters' Config Mbean.
Target exception message: All server instances in cluster cluster-name were not started.
Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: management/rmi-jmx-connector
Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: management/rmi-jmx-connector
Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: management/rmi-jmx-connector
CLI137 Command start-cluster failed.

Why does this happen? The server starts up and runs fine but eventually hits this after a variable amount of time. When it happens, I haven't found any solution yet besides rebooting all the machines in the cluster.

Also, rather than stopping/starting the cluster, trying to just stop/start any of the individual instances results in the same

Failed to retrieve RMIServer stub: javax.naming.NameNotFoundException: management/rmi-jmx-connector

message. Need help soon! I'll be watching this closely and can respond quickly. What other information would be helpful?

A few weeks the above error was thrown when trying to stop the cluster, yet that time the instances were still serving the webpages just fine and eventually the error went away on its own because the next day I was able to execute successfully the same command that caused the error the day before.

What does this error mean? Why does it occur? Why does it occur intermittently? And why does it sometimes solve itself and other times not?

Thanks!

I'll be watching this closely, and will respond quickly. What other information
[Message sent by forum member 'rwillie6' (rwillie6)]

http://forums.java.net/jive/thread.jspa?messageID=252066