users@glassfish.java.net

GF 2.1 Cluster deployment issues

From: <glassfish_at_javadesktop.org>
Date: Sat, 16 Jan 2010 09:39:22 PST

We are trying to replicate deployment problems that we see in our production cluster that we don't see in our test environments.

Our prod env is 5 machines, with two instances per machine, for a total of 10 instances in the cluster.

Our test env is 2 machines, w/2 instances per machine. I increased the instances to 5 per machine (had to decrease the heap size and shutdown other processes in the test env to get enough memory to do this). Here I found problems with /var/tmp space.

We ensured that /var had enough disk space during our last production deployment and still had problems. No exceptions occurred, but some of the apps won't respond to requests.

Our workaround currently is to stop the node agents, delete the whole instance configuration (/opt/glassfish/nodeagents/*) , and restart the node agents. This has consistently deployed a working application. Simply restarting the instances and/or nodeagents does not do the trick. Deleting .com_sun_appserv_timestamp (http://blogs.sun.com/nazrul/entry/under_the_hood_of_glassfish) didn't fix it either.

Both envs are running GF 2.1 Patch 04 on Solaris 10 T2000s.

Our .war is quite large: 141MB.

Does anyone have a good description/document that details all the steps glassfish does in a cluster .war deployment?

Here's the steps I understand from a filesystem perspective:
1) The DAS creates multiple copies of the .war as .zip files in the OS temp dir (/var/tmp on our Solaris machines).
2) These files are copied out to the managed instance machine's /var/tmp/

How many individual copies of this .war should there be in the DAS's /var/tmp? One per instance in the cluster? I only see 6 copies in a cluster w/10 instances. One per node agent + 1?

And details would be greatly appreciated.
[Message sent by forum member 'dgulino' (drew_gulino_at_yahoo.com)]

http://forums.java.net/jive/thread.jspa?messageID=381352