users@grizzly.java.net

Re: IdleTimeoutFilter.queue leaking

From: Marc Arens <marc.arens_at_open-xchange.com>
Date: Thu, 16 Jan 2014 23:06:08 +0100 (CET)

Hello Alexey,

> On 13 January 2014 at 22:28 Oleksiy Stashok <oleksiy.stashok_at_oracle.com>
> wrote:
>
>
> Hi Marc,
>
> On 13.01.14 08:37, Marc Arens wrote:
> > Hello,
> >
> > we're currently seeing a problem for a single customer with
> > org.glassfish.grizzly.utils.IdleTimeoutFilter.queue leaking/not getting
> > cleaned
> > properly so the server runs out of memory within ~48 hours. As far as i
> > understand Connections aren't removed from the queue until
> > org.glassfish.grizzly.utils.DelayedExecutor.DelayedRunnable instructs
> > org.glassfish.grizzly.utils.IdleTimeoutFilter.DefaultWorker.doWork(Connection)
> > to close the connection via closeSilently() which results in
> > org.glassfish.grizzly.utils.IdleTimeoutFilter.handleClose(FilterChainContext)
> > being called.
> >
> > Are/were there known bugs i missed or is there an easy explanation for the
> > behaviour we're seeing? We don't see this on other installations and i could
> > reproduce a similar behaviour by making a Servlet's service hangs endlessly.
> > While the connections from the apache in front of grizzly were closed after
> > a
> > apache timeout of 100 seconds, the queue didn't shrink at all.
> IdleTimeoutFilter keeps track of all FilterChainContext associated with
> a Connection and allows to close the Connection only when it doesn't
> have any active FilterChainContext associated with the Connection.
> In your case, when you have endless Servlet.service() method - you keep
> FilterChainContext active, so IdleTimeoutFilter doesn't treat the
> Connection as idle.
>
> I'm not aware of any bugs related to IdleTimeoutFilter, but if you have
> a testcase - we'll definitely take a look and fix the problem.

No testcase but we finally managed to reproduce/watch via remote debugging it on
a special debian squeeze derived distribution some customers are using. I
assumed it might have been caused by an old openjdk-6 runtime but could be seen
with sun-6 and even oracle-7 but only on this system.

We had configured a idleTimeout of FOREVER during grizzly startup

final KeepAlive keepAlive = networkListener.getKeepAlive();
keepAlive.setIdleTimeoutInSeconds(-1);
keepAlive.setMaxRequestsCount(-1);

now watching a request get handled showed the following:

during handleClose the IdleTimeoutFilter sets timeoutMillis to 0 as observed in
the debugger:

Daemon Thread [OXWorker-0000036] (Suspended (breakpoint at line 325 in
IdleTimeoutFilter$Resolver))
        IdleTimeoutFilter$Resolver.removeTimeout(Connection) line: 325
        IdleTimeoutFilter$Resolver.removeTimeout(Object) line: 321
        DelayedExecutor$DelayQueue<E>.remove(E) line: 196
        IdleTimeoutFilter.handleClose(FilterChainContext) line: 18

public boolean removeTimeout(final Connection connection) {
    IDLE_ATTR.get(connection).timeoutMillis.set(0);
    return true;
}

resulting in

(org.glassfish.grizzly.nio.transport.TCPNIOConnection)
TCPNIOConnection{localSocketAddress=/127.0.0.1:8009,
peerSocketAddress=/127.0.0.1:57573}
IDLE_ATTR.get(connection)
         (org.glassfish.grizzly.utils.IdleTimeoutFilter$IdleRecord)
org.glassfish.grizzly.utils.IdleTimeoutFilter$IdleRecord_at_6ce13334
IDLE_ATTR.get(connection).timeoutMillis
         (java.util.concurrent.atomic.AtomicLong) 0

This should normally cause the next run of DelayedExecutor.DelayedRunnable to
remove the element from the DelayQueue as currentTimeMillis - 0 >= 0

final Object element = it.next();
final Long timeoutMillis = resolver.getTimeoutMillis(element);
...
  } else if (currentTimeMillis - timeoutMillis >= 0) {
        it.remove();
...

but when examining this part in the debugger again we se that timeoutMillis was
reset from 0 to FOREVER(9223372036854775807) again

(org.glassfish.grizzly.nio.transport.TCPNIOConnection)
TCPNIOConnection{localSocketAddress=/127.0.0.1:8009,
peerSocketAddress=/127.0.0.1:57573}
IDLE_ATTR.get(connection)
         (org.glassfish.grizzly.utils.IdleTimeoutFilter$IdleRecord)
org.glassfish.grizzly.utils.IdleTimeoutFilter$IdleRecord_at_6ce13334
IDLE_ATTR.get(connection).timeoutMillis
         (java.util.concurrent.atomic.AtomicLong) 9223372036854775807

this seems to be a concurrency issue as it doesn't always happen when i add an
additional breakpoint in
IdleTimeoutFilter.ContextCompletionListener.onComplete()

The same code/config just works on a default debian. Any thoughts on this one?

>
> >
> > I found the org.glassfish.grizzly.utils.ActivityCheckFilter which sound
> > useful
> > for this kind of situation but how does it work in combination with
> > suspended
> > requests e.g. Comet?
> ActivityCheckFilter is more lax, it doesn't track live
> FilterChainContexts, but monitors Connection activity (read, write,
> other events), and if there was no activity registered for a given
> period of time - it closes the Connection.
>
> Both Filters could work with Comet, both of them meant to detect idle
> Connections, but use different approaches for that. So it's up to you to
> decide which one is better for a specific usecase.
>
> Thanks.
>
> WBR,
> Alexey.

Best regards,
Marc