users@grizzly.java.net

CLOSE_WAIT connections (was: Comet context doesn't expire)

From: Jussi Kuosa <jussi.kuosa_at_f-secure.com>
Date: Thu, 13 Aug 2009 06:24:16 -0700 (PDT)

Hello again,
we have dug out more information about our problems...

> > > comet selector spin problem...
...
> > OK that one is now fixed with grizzly-1.0.30-SNAPSHOT:
...
> We patched our linux and windows servers with 1.0.30. Now our windows
> cluster
> and linux single-node system test environment have started to gather TCP
> connections in CLOSE_WAIT state that are not cleared even though the
> client
> processes have gone away ages ago.
...
> They seem to come in batches and irregularly...

We have found out at least one cause for this behavior. Because we
misunderstood the expiration interval reset on server-side push (below), our
client has a timeout that kills the CONNECT HTTP connection during the
expiration wait period. The client sends a FIN and gets an ACK from the
server, so the c->s side is closed. For some reason, the server does not
notice that the client has begun to close the connection and the end result
is that on the client the connection waits in FIN_WAIT_2 state and the
server has a CLOSE_WAIT connection.

My understanding is that onTerminate() should be called when the client goes
away during the sleep period? Am I correct, or does it have to reset the TCP
connection with RST?

After this sequence, the server has:
# netstat -a | grep 3289
tcp6 419 0 server:8282 client:3289 CLOSE_WAIT

Notice that there is still data left in the Recv-Q (419 bytes) that was not
copied to the server???
The server configuration is:
 * single-node GF 2.1-60e
 * JDK 1.6.0_16 (32-bit)
 * 32-bit debian 4
 * VMware VM with single-core ~2.3GHz Xeon with 2GB memory.

The client side has:
>netstat -a
  TCP server:8282 client:4005 FIN_WAIT_2

I will gladly provide the network capture privately, if needed.

> We were unaware of (2) and presumed that the client expiration delays
> would
> not be extended on every push. In addition we do not send the push data to
> every connected client within a channel. Therefore we have identified a
> way
> to push data to a few active clients that causes them reconnect and
> receive
> additional push data within the expiration delay. This causes other
> connected
> clients to constantly have their expiration delays reset and therefore
> onInterrupt
> doesn't get called. Eventually there clients will have a client-side
> timeout.
> The situation is cleared once the few clients stop receiving push data on
> every
> CONNECT.

Best regards,

    Jussi Kuosa
-- 
View this message in context: http://www.nabble.com/Comet-context-doesn%27t-expire-tp24072882p24954380.html
Sent from the Grizzly - Users mailing list archive at Nabble.com.