Re: how do I avoid a connection pool filling up?

From: Steve Essery <Steve.Essery_at_Sun.COM>
Date: Tue, 13 Jan 2009 11:00:22 +0000

Dick Davies wrote:
> I got this error on my cluster over the weekend:
>
> [#|2009-01-10T14:33:06.521+0000|WARNING|sun-appserver9.1|javax.enterprise.resource.resourceadapter|_ThreadID=154;
> _ThreadName=httpSSLWorkerThread-39181-21;oracle-prod-pool;In-use
> connections equal max-pool-size and expired max-
> wait-time. Cannot allocate more
> connections.;_RequestID=5e1a41cd-1a74-4a5e-8d3b-2582599ac0b6;|RAR5117
> : Failed to
> obtain/create connection from connection pool [ oracle-prod-pool ].
> Reason : In-use connections equal max-pool-s
> ize and expired max-wait-time. Cannot allocate more connections.|#]
>
> which I read as 'your connection pool is full, and I've been waiting
> too long, so here's an Exception'.
>
Yes, that's the meaning. The question is why have you hit the maximum
number of connections, and you head in the right direction below... You
could also look at turning on JDBC monitoring and using the asadmin
monitor command to capture statistics for the pool and see what the
connection usage is like over time, as well as seeing how many
connection requests are being forced to wait as the pool is fully in use.
>
> Both appservers started giving 500s and the site was down for an hour or two.
> It looks like there was some kind of network blip that stalled a few
> connections, when I restarted the glassfish instances
> everything was back up and running.
>
>
>
> So I'm wondering how I tell Glassfish to step in and kill of idle
> connections after a certain amount of time.
>
> in the admin console I see a couple of options, I'm wonder
>
> The first is Resources> JDBC> Connection Pools> oracle-prod-pool -> advanced
>
> set a leak timeout and then tick 'leak reclaim'.
>
You might want to start off just turning on the leak detection and
setting a timeout and seeing if there are connections leaking. If they
are you'll see a call stack in the server.log indicating the state of
the thread at the point the application code fetched a connection from
the pool. Working down the stack you'll see where in the application
the connection was obtained. The question then would be why wasn't that
connection returned to the pool before the leak timeout expired? Is
there a Connection.close() missing for example. Unfortunately I've
quite often see issues in application code where its assume there will
be no problems performing JDBC operations and because of exceptions JDBC
resources such as Connections, Statements and ResultSets aren't closed
correctly - even in a finally block.

If there is a leak turning on reclaim simply destroys the connection and
replaces it in the pool with a fresh connection. If the application is
caching a connection for some reason and tries to use it once its been
reclaimed in this way it will obviously get an exception.
> The second is Resources> JDBC> Connection Pools> oracle-prod-pool -> general
>
> enable 'connection validation' and then 'on any failure' tick 'close
> all connections'
>
Another possibility is that validation isn't enabled, or isn't correctly
configured for the database in question. Where this matters typically
is in the situation where the database is shutdown while the application
server has a pool of connections open to it. If the pool isn't
validating, or is using the wrong mechanism, then those connections,
which are no longer connected (and hence useless) will sit in the pool
until they finally exceed the idle timeout. Unfortunately the pool will
still pass these connections to application code (it has no way of
knowing they are unusable without proper validation) which resets the
idle time on them.... on a busy pool this could mean they persist for a
significant period of time. I'd expect more signs of this in the
server.log - the hope being that application code that uses such a
connection will report the SQLException they should get from attempting
any JDBC operation on it.

With Oracle the only workable validation mechanism is table-based.
Since DUAL is a table that's been present in Oracle as long as I can
recall (I go back to v6) that's the table I'd suggest. Table based
validation works by parsing and executing a select statement on the
table, but it does not fetch back any data from the table. The parse
makes the Oracle JDBC driver contact the backend database, which
essentially validates that the connection is still good. This is
effectively the same mechanism as used in the Oracle propriatary
OracleConnection.pingDatabase() method.

The problems with the other two methods are as follows. The auto-commit
mechanism essentially looks at the current setting of the auto commit
flag in the driver and toggles to the opposite and then back to what it
was originally. Oracle's JDBC driver, and probably others, cache this
status and avoid a network hop to the server until they absolutely have
to so they reduce the amount of network traffic they generate.

The issue is similar with the DB MetaData method. Here a DBMetaData
object is created, however the Oracle JDBC driver again defers any
network trips until a call is made to a method that has to contact the
database to return the required data. With the enterprise version of
the Sun Glassfish Application Server the JDBC driver for the HADB
database does make a connection to its database so DBMetaData is used
for validation with pools to HADB.

Finally a firewall between the Application Server machine and the
database, which is configured to drop idle connections, can also cause
complications. Most firewalls I've encountered do not drop connections
cleanly, i.e. they don't send a RST package to the client to know that
the connection has been closed. This means when such a terminated
connection is used in the Application Server, any attempt by the
connection to talk to the backend database will hang, normal TCP/IP
reliable delivery kicks in and multiple attempts will be made to send
TCP/IP packets to the database - only when all those attempts expire
(typically around 8 minutes) will the problem result in a SQLException
being returned. This essentially could cause a bottleneck for a period
of time where there could be a number of dead/hung connections
effectively reducing the number of available connections in the pool
that can be used. If the pool is only just big enough to support the
load placed on it, this could be enough to lead to requests waiting, and
possibily exceeding the wait timeout. Making sure the idle timeout in
the pool is less than the firewall timeout should be enough nowadays to
avoid this situation.
> Which do you folks think would be most likely to avoid this recurring
> without impacting on the app (Roller 4 using openjpa)?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>
>