dev@grizzly.java.net

Re: TCPSelectorHandler$1 leaking

From: Scott Oaks <Scott.Oaks_at_Sun.COM>
Date: Fri, 26 Jun 2009 17:30:00 -0400

On 06/26/09 15:29, Jeanfrancois Arcand wrote:
> Salut,
>
> Scott Oaks wrote:
>> On 06/26/09 10:26, Jeanfrancois Arcand wrote:
>>> Salut,
>>>
>>> Oleksiy Stashok wrote:
>>>> Hi,
>>>>
>>>>>>>> Hi Scott,
>>>>>>>>> Sure thing, but I'm slightly confused by the wording -- do I
>>>>>>>>> need to get the new source and build, or is the latest snapshot
>>>>>>>>> download already built from the new source?
>>>>>>>> To be sure, I'll prefer to try from sources, because snapshots
>>>>>>>> could be produced with delays, and hudson is not so stable.
>>>>>>>
>>>>>>> Alexey, I saw your commit...that will for sure fix the issue, but
>>>>>>> we also need to find why we leak so bad when the pending I/O is
>>>>>>> executed by the thread pool. I suspect this could be related to
>>>>>>> our thread-count number (and all the issue we are observing right
>>>>>>> now with Executors ;-))
>>>>>> I'm even not sure there is some leak, because during stress test
>>>>>> we may load Thread-pool so hard, that it executes pending tasks
>>>>>> slower, than we add them. So the number of tasks continuesly grows.
>>>>>
>>>>> Well Scott measured 8 millions on TCPSelectorHandler$1
>>>>> instance....I serioulsy thinks this is a major leak. It is not
>>>>> normal to see all those instance IMO.
>>>>>
>>>>> We need to rethink about the thread pool....we see too many
>>>>> regressions right now. Will start a new thread.
>>>> We can use kernel thread pool to execute pending tasks.
>>>
>>> That would make sense as this use a CachedThreadPool (quite scary
>>> still). But this is dangerous IMO. We need to find why 8 millions of
>>> that class were there at the first place. Working with Scott....
>>
>> I took the sources from earlier this morning and built the 1.9.17
>> SNAPSHOT, and with that, I don't see the problem with 8 million of the
>> TCPSelectorHandler$1 getting created -- there aren't really major GC
>> issues at all any more. Still a regression from V2, but that's another
>> story...
>>
>> My understanding is that Alexey backed out his proposed fix yesterday,
>> so presumably something else has fix this? Or maybe there's something
>> I've yet to discover.
>
> What Alexey did is to turn off using a dedicated Thread to close I/O
> operation...so we were back to what we always used v2/v3. I'm about to
> commit a fix that will allow configuring the behavior using System
> property. I will let the current mechanism turned off, but it would be
> nice (once you have a chance) to test using:
>
>
> -Dcom.sun.grizzly.finishIOUsingCurrentThread=false
>
> which will turn on the mechanism that created 8 millions of runnable.

I have tested with that flag set to true and false, and I still am
unable to reproduce the error. I can only conclude at this point that
some other change between 1.9.15a and now has fixed the issue and that
the pending tasks aren't being blocked anymore so they don't build up.

I could spend some time seeing just what...but what are the plans to
integrate something new into glassfish? If glassfish will move to 1.9.17
or later soon, maybe it's not worth it.

-Scott