Hello Alexey,
> On 17 January 2014 at 08:17 Oleksiy Stashok <oleksiy.stashok_at_oracle.com>
> wrote:
>
>
> Hi Marc,
>
> makes sense, I commited the fix on 2.2.x branch and released new
> 2.2.22-SNAPSHOT, can you pls. give it a try?
>
I picked the fix and did a 2 hour long test on the affected system. The heapdump
after the test didn't show any leaked DelayedExecutors while it would have shown
a huge leak within just 10 minutes of testing before your fix. Great work,
thanks for the very fast fix and have a nice weekend!
> Thanks.
>
> WBR,
> Alexey.
>
> On 16.01.14 14:06, Marc Arens wrote:
> > Hello Alexey,
> >
> >> On 13 January 2014 at 22:28 Oleksiy Stashok <oleksiy.stashok_at_oracle.com>
> >> wrote:
> >>
> >>
> >> Hi Marc,
> >>
> >> On 13.01.14 08:37, Marc Arens wrote:
> >>> Hello,
> >>>
> >>> we're currently seeing a problem for a single customer with
> >>> org.glassfish.grizzly.utils.IdleTimeoutFilter.queue leaking/not getting
> >>> cleaned
> >>> properly so the server runs out of memory within ~48 hours. As far as i
> >>> understand Connections aren't removed from the queue until
> >>> org.glassfish.grizzly.utils.DelayedExecutor.DelayedRunnable instructs
> >>> org.glassfish.grizzly.utils.IdleTimeoutFilter.DefaultWorker.doWork(Connection)
> >>> to close the connection via closeSilently() which results in
> >>> org.glassfish.grizzly.utils.IdleTimeoutFilter.handleClose(FilterChainContext)
> >>> being called.
> >>>
> >>> Are/were there known bugs i missed or is there an easy explanation for the
> >>> behaviour we're seeing? We don't see this on other installations and i
> >>> could
> >>> reproduce a similar behaviour by making a Servlet's service hangs
> >>> endlessly.
> >>> While the connections from the apache in front of grizzly were closed
> >>> after
> >>> a
> >>> apache timeout of 100 seconds, the queue didn't shrink at all.
> >> IdleTimeoutFilter keeps track of all FilterChainContext associated with
> >> a Connection and allows to close the Connection only when it doesn't
> >> have any active FilterChainContext associated with the Connection.
> >> In your case, when you have endless Servlet.service() method - you keep
> >> FilterChainContext active, so IdleTimeoutFilter doesn't treat the
> >> Connection as idle.
> >>
> >> I'm not aware of any bugs related to IdleTimeoutFilter, but if you have
> >> a testcase - we'll definitely take a look and fix the problem.
> > No testcase but we finally managed to reproduce/watch via remote debugging
> > it on
> > a special debian squeeze derived distribution some customers are using. I
> > assumed it might have been caused by an old openjdk-6 runtime but could be
> > seen
> > with sun-6 and even oracle-7 but only on this system.
> >
> > We had configured a idleTimeout of FOREVER during grizzly startup
> >
> > final KeepAlive keepAlive = networkListener.getKeepAlive();
> > keepAlive.setIdleTimeoutInSeconds(-1);
> > keepAlive.setMaxRequestsCount(-1);
> >
> > now watching a request get handled showed the following:
> >
> > during handleClose the IdleTimeoutFilter sets timeoutMillis to 0 as observed
> > in
> > the debugger:
> >
> > Daemon Thread [OXWorker-0000036] (Suspended (breakpoint at line 325 in
> > IdleTimeoutFilter$Resolver))
> > IdleTimeoutFilter$Resolver.removeTimeout(Connection) line: 325
> > IdleTimeoutFilter$Resolver.removeTimeout(Object) line: 321
> > DelayedExecutor$DelayQueue<E>.remove(E) line: 196
> > IdleTimeoutFilter.handleClose(FilterChainContext) line: 18
> >
> > public boolean removeTimeout(final Connection connection) {
> > IDLE_ATTR.get(connection).timeoutMillis.set(0);
> > return true;
> > }
> >
> > resulting in
> >
> > (org.glassfish.grizzly.nio.transport.TCPNIOConnection)
> > TCPNIOConnection{localSocketAddress=/127.0.0.1:8009,
> > peerSocketAddress=/127.0.0.1:57573}
> > IDLE_ATTR.get(connection)
> > (org.glassfish.grizzly.utils.IdleTimeoutFilter$IdleRecord)
> > org.glassfish.grizzly.utils.IdleTimeoutFilter$IdleRecord_at_6ce13334
> > IDLE_ATTR.get(connection).timeoutMillis
> > (java.util.concurrent.atomic.AtomicLong) 0
> >
> > This should normally cause the next run of DelayedExecutor.DelayedRunnable
> > to
> > remove the element from the DelayQueue as currentTimeMillis - 0 >= 0
> >
> > final Object element = it.next();
> > final Long timeoutMillis = resolver.getTimeoutMillis(element);
> > ...
> > } else if (currentTimeMillis - timeoutMillis >= 0) {
> > it.remove();
> > ...
> >
> > but when examining this part in the debugger again we se that timeoutMillis
> > was
> > reset from 0 to FOREVER(9223372036854775807) again
> >
> > (org.glassfish.grizzly.nio.transport.TCPNIOConnection)
> > TCPNIOConnection{localSocketAddress=/127.0.0.1:8009,
> > peerSocketAddress=/127.0.0.1:57573}
> > IDLE_ATTR.get(connection)
> > (org.glassfish.grizzly.utils.IdleTimeoutFilter$IdleRecord)
> > org.glassfish.grizzly.utils.IdleTimeoutFilter$IdleRecord_at_6ce13334
> > IDLE_ATTR.get(connection).timeoutMillis
> > (java.util.concurrent.atomic.AtomicLong) 9223372036854775807
> >
> > this seems to be a concurrency issue as it doesn't always happen when i add
> > an
> > additional breakpoint in
> > IdleTimeoutFilter.ContextCompletionListener.onComplete()
> >
> > The same code/config just works on a default debian. Any thoughts on this
> > one?
> >
> >>> I found the org.glassfish.grizzly.utils.ActivityCheckFilter which sound
> >>> useful
> >>> for this kind of situation but how does it work in combination with
> >>> suspended
> >>> requests e.g. Comet?
> >> ActivityCheckFilter is more lax, it doesn't track live
> >> FilterChainContexts, but monitors Connection activity (read, write,
> >> other events), and if there was no activity registered for a given
> >> period of time - it closes the Connection.
> >>
> >> Both Filters could work with Comet, both of them meant to detect idle
> >> Connections, but use different approaches for that. So it's up to you to
> >> decide which one is better for a specific usecase.
> >>
> >> Thanks.
> >>
> >> WBR,
> >> Alexey.
> > Best regards,
> > Marc
>
Best regards,
Marc