users@glassfish.java.net

Re: Hang up: Too many open files

From: <glassfish_at_javadesktop.org>
Date: Thu, 13 Sep 2007 06:46:37 PDT

> Dont know how to determine associations
I just meant associated by time, a noticeable spike in accesses in a small time frame, like maybe for 10 minutes or so, possibly some automated systems updating them selves from your web site at a time they believed to be off hours.
>
> > Is Apache httpd on the same box or is there a
> firewall between Apache and Glassfish?
> yes there is a firewall.
>
> > Do you have access to Apache's mod_jk.log if we
> need to look there?
> i have attached some files

I was asking about this because I recently had to deal with an issue that was caused by firewall interaction with mod_jk. The error messages in mod_jk.log were similar to yours, but I don't believe the causes to be the same. I think your error messages from mod_jk are indeed caused by Glassfish no longer responding, where mine were caused by the firewall deciding that a connection was no longer being used. Not the same thing.

Here's a possible scenario that I think is worth trying to prove or disprove as being the root cause of your problem:
1. A cron job kicks off on your Glassfish box at nearly the same time every day. This cron job is very resource-intensive in some way, probably disk I/O. It might be a daily backup, or a file-indexing program, something that pretty much eats all of some resource that Glassfish needs in order to repond quickly. I think a disk backup being run by your ISP is a likely candidate here.
2. Glassfish slows down, can't respond to requests quickly enough to clear them. Requests stack up, clients start timing out. Files remain open in Glassfish until Glassfish can deal with the request - and I think some file handles are not released until finalizers are called during garbage collection (based on watching the rise and fall of open file counts.
3. Your open-file limits, still too low for demanding conditions, are exceeded and everything comes to a grinding screeching halt until Glassfish can be restarted.

Such a scenario would explain the timing, and I've seen backups drag a box down before.
I would suggest trying to find out what else besides Glassfish is running at that time, preferably by personal observation because I tend not to trust what I am told when it comes to troubleshooting - all too often it has been either wrong or misleading.
Look at the manpage for a command called "vmstat". I assume you are also familiar with "top". And there's good old "ps". These tools can help you see if a process is draining resources needed by Glassfish at the critical time.

-Paul
[Message sent by forum member 'paulr5930' (paulr5930)]

http://forums.java.net/jive/thread.jspa?messageID=235252