In case anyone still following this thread is interested, there are some
new developments:
Sun support now has a whole team of people involved helping me figure
this out. I've managed to provide them with two full thread dumps while
it was in a locked state. The Grizzly threads were blocked waiting for
the resource manager to return a database connection from the pool.
Further examination revealed that the MS SQL Server JDBC driver
connected to the database, sent a login request, and was waiting forever
for a response. At least one of our lockups happened in the middle of
MS SQL Server 2005 backup which made the DB unusable for about 40
minutes.
We found that the JDBC driver was not configured to have a login
timeout, and the default behavior is to wait forever. Even when the db
came back to life, it doesn't know and stays blocked. This keeps the
resource manager blocked, and therefore the grizzly thread blocked.
More users try to access our system, and eventually all five grizzly
threads are blocked. Users get a blank screen waiting for a response
forever until I restart GlassFish. Disabling the http listener then
re-enabling it brings it back to life because new threads are created,
but they would all block when trying to access the db again.
The MS SQL Server 2005 JDBC driver manual says I should add this
property to my datasource (the restart app server):
loginTimeout
int [>=0..65535]
0
The number of seconds the driver should wait before timing out a failed
connection. A zero value indicates no time-out value. A non-zero value
is the number of seconds the driver should wait before timing out a
failed connection.
I'm going to do that tonight. I'll let everyone know if this solves our
lockup problems for good.
I know Kevin MacDonald experiences the blank screen symptom and does not
use a database. Something else is causing the threads to block. If you
can capture a thread dump while it is in a locked up state, that will
help Sun determine the cause.
Thanks,
Ryan