users@grizzly.java.net

Re: Old problem still persists ...

From: Alan Williamson <alan_at_blog-city.com>
Date: Tue, 26 Aug 2008 15:32:20 +0100

@Alexey: Thanks for replying ... i would love to give you access to
the code, but i sadly i cannot. I have tried to get a use case but all
i can say is that this code is running on Amazon EC2 instances and
processing around 450+ per second.


@Jeanfrancois: the lsof count is low ... but the socket count is very
high from your "netstat -an" command. They are sitting behind an nginx
loadbalancer, so all the requests are coming from the *same* IP address.


It churns through happily and every so often, it just goes into this
state and refuses to come out; a JVM restart is required. So anything
i can do to help you guys debug this i can; i have a spare machine in
the web farm that i can 'dabble' with and not have too many angry
customers shout! :)

Jeanfrancois Arcand wrote:
> Salut Alan,
>
> looking at the thread dump:
>
>
> Which is really caused because of a file descriptor leaks. Can you grab a
>
> % netstat -an | grep 80 | wc -l
>
> count when this happens. On our side , I suspect the exception is
> swallowed when the OP_ACCEPT (accepting the connection) fail because the
> OS run out of file descriptor. Let me send you a patch that will add
> some debugging information (so we can get more information about who is
> leaking file descriptor).
>
> We did a lot of tests using Grizzly in GlassFish v3 and so far I haven't
> see any file descriptor leaks, but we never know.
>
> Give me 1 hour and I will send you a patch.