users@glassfish.java.net

Re: Glassfish Slowing Down

From: Ryan de Laplante <ryan_at_ijws.com>
Date: Wed, 18 Jun 2008 13:30:36 -0400

We've been living with NP Pool problem for a long time. Here are what
I noticed about it:

- Most of the time we are not affected by it. We can see NP Pool count
is much higher in Task Manager for java.exe than other programs, but it
stays around 400K - 500K and works fine.

- Every 2-3 months Postgres will become unusably slow when doing TCP/IP
operations like connecting, logging in, browsing tables etc. even with
the pgAdmin III. GlassFish would also become nearly unusable.
Restarting Postgres and GlassFish did not solve the problem. We had to
reboot.

- We experienced weird problems with MS SQL Server 2005 JDBC driver
every few days. It seemed to have poor error handling of network
problems and would cause GlassFish to hang. While it was in this state
you could watch the NP Pool count rise 1K every 15 seconds. I've seen it
up close to 3000K in Task Manager. No other programs on the computer
were affected, and restarting GlassFih solved the problem until next
time. NP Pool growing rapidly seemed to be directly related to the
JDBC driver. Replacing it with jTDS driver solved that problem. I
don't attribute these lockups to the NP Pool leak.


So in my experience, the real effects of NP Pool bug only surface every
2-3 months and you can recognize it when the only way to fix it is to
reboot Windows.


Ryan


Eric Chamberlain wrote:
> Jeanfrancois,
>
> Hmm. Your comments about Windows 2003 prompt some questions:
>
> 1. If Windows 2003 is leaking nbpool wouldn't that cause the server to crash? We have had
> no crashes of the server, only Glassfish slowing down. The server has been solid even
> with other active IIS services running on it.
> 2. If the problem was in the OS, then why does restarting Glassfish clear the problem? Is
> the nbpool problem a *per-process* issue?
> 3. If I configure to use blocking rather than non-blocking sockets, what would that do to
> my throughput?
>
>
> Eric Chamberlain
> VentriPoint, Inc. | www.ventripoint.com | Software Engineer
> Helping heart care through innovative diagnostic solutions
>
>
>
>>> -----Original Message-----
>>> From: Jeanfrancois.Arcand_at_Sun.COM
>>> [mailto:Jeanfrancois.Arcand_at_Sun.COM]
>>> Sent: Wednesday, June 18, 2008 7:14 AM
>>> To: users_at_glassfish.dev.java.net
>>> Subject: Re: Glassfish Slowing Down
>>>
>>> Salut,
>>>
>>> almost forget to reply....
>>>
>>> Eric Chamberlain wrote:
>>>
>>>> My comments are in-line below.
>>>>
>>>> Eric Chamberlain
>>>> VentriPoint, Inc. | www.ventripoint.com | Software Engineer
>>>> Helping heart care through innovative diagnostic solutions
>>>>
>>>>
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Jeanfrancois.Arcand_at_Sun.COM
>>>>>> [mailto:Jeanfrancois.Arcand_at_Sun.COM]
>>>>>> Sent: Wednesday, June 11, 2008 5:46 PM
>>>>>> To: users_at_glassfish.dev.java.net
>>>>>> Subject: Re: Glassfish Slowing Down
>>>>>>
>>>>>> Salut,
>>>>>>
>>>>>> Eric Chamberlain wrote:
>>>>>>
>>>>>>>> Greetings all.
>>>>>>>>
>>>>>>>> I am seeing a problem in which my Glassfish-hosted
>>>>>>>>
>>>>>> service slows down over time. Over
>>>>>>
>>>>>>>> a few weeks, it slows down to 10x its original response
>>>>>>>>
>>>>>> time. The
>>>>>>
>>>>>>>> Glassfish framework is used only to support this one
>>>>>>>>
>>>>>> service (which,
>>>>>>
>>>>>>>> conveniently, has but one external API). When I stop the
>>>>>>>>
>>>>>> Glassfish
>>>>>>
>>>>>>>> instance and re-start it, the response time immediately
>>>>>>>>
>>>>>> goes back to its expected (short) interval.
>>>>>> Can you gives more information about your service? Mainly,
>>>>>> are you using only JSP and Servlet (no db, no remote call,
>>>>>> no extrenal component)?
>>>>>>
>>>> There is no JSP or servlet involved here. All we have is
>>>>
>>> a web service.
>>>
>>> OK they might be a problem there, but I doubt.
>>>
>>>
>>>
>>>>>> Most of the time this problem is cased by an external
>>>>>> component that locks our threads.
>>>>>>
>>>>>>
>>>> I don't know anything external that would do that. The
>>>>
>>> individual calls are short and
>>>
>>>> there is no database involved.
>>>>
>>>>
>>>>>>>> We have conducted stress testing on the service code and
>>>>>>>>
>>>>>> there is no
>>>>>>
>>>>>>>> slow down detectable even when we simulate many weeks
>>>>>>>>
>>> of use in a
>>>
>>>>>>>> short time. We also could detect no bloat of the heap in
>>>>>>>>
>>>>>> our stress
>>>>>>
>>>>>>>> testing. BTW, the service does *not* access any databases.
>>>>>>>>
>>>>>>>> The next suspect is the Glassfish framework itself. How
>>>>>>>>
>>>>>> can I find
>>>>>>
>>>>>>>> out more information on when and if Glassfish is slowing down
>>>>>>>> handling requests? Is there someway that I can post
>>>>>>>>
>>> an automatic
>>>
>>>>>>>> monitor which will help me track down changes in response
>>>>>>>>
>>>>>> times over a multi-week time frame and correlate them with
>>>>>> memory usage?
>>>>>>
>>>>>> Are you able to reproduce the problem easily? Do you think
>>>>>> are you able to get a thread dump when it starts slowing down?
>>>>>>
>>>>>>
>>>> Reproducing the problem takes a while but within a two or
>>>>
>>> three weeks the problem is
>>>
>>>> perceptable. It is not possible to reproduce the problem
>>>>
>>> within a day. I probably could
>>>
>>>> see the change over a week if I tracked closely the response times.
>>>>
>>> When it happens, are you able to do a jstack <PID> >
>>> dump.txt and send
>>> it here?
>>>
>>>
>>>
>>>
>>>>>>>> Another hypothesis is that we're losing information from
>>>>>>>> dis-connection and re-connections that happen over time
>>>>>>>>
>>>>>> (and thus are not seen in a stress test because of
>>>>>>
>>>>>>>> the compressed time frame). Have there been any
>>>>>>>>
>>>>>> problems in this area found by others?
>>>>>>
>>>>>>>> I could find nothing myself.
>>>>>>>>
>>>>>> Grizzly (the http front end of GlassFish) will close
>>>>>> connections under two circumstances:
>>>>>>
>>>>>> (1) A connection is idle for more that 30 seconds
>>>>>> (2) More than 250 requests has been made on a persistent
>>>>>>
>>> connection.
>>>
>>>>>> Based on the above information, I suspect (1) might happens.
>>>>>> But usually slow down are observed because
>>>>>>
>>>>>> (1) All the worker thread takes times to execute. During
>>>>>> that time, all incoming requests are queued. As soon as one
>>>>>> thread is free, it dequeue one request and execute it. By
>>>>>> default, the queue can accept 4096 connection. After
>>>>>> reaching that limit, Grizzly just start refusing requests.
>>>>>>
>>>>>> (2) Your application/framework is doing something wrong by
>>>>>> caching/storing data (wild guess).
>>>>>>
>>>> I cannot reproduce the problem by stressing the app over
>>>>
>>> the short run. This leads me to
>>>
>>>> think it is not in the app.
>>>>
>>>>
>>>>>> So, first let's do the usual configuration stuff. First, can
>>>>>> you add, in domain.xml, the following property:
>>>>>>
>>>>>> <jvm-options>-Dcom.sun.enterprise.server.ss.ASQuickStartup=fa
>>>>>>
>>>> lse</jvm-options>
>>>>
>>>> I do not know what effect I should expect from this
>>>>
>>> change. Please explain.
>>>
>>> It is just disabling the following mechanism:
>>>
>>> http://weblogs.java.net/blog/binod/archive/2005/09/lazy_initi
>>> aliza.html
>>>
>>> shouldn't make a difference, but just to make sure.
>>>
>>>
>>>
>>>>>> Restart GlassFish and try to reproduce the problem. Can you
>>>>>> also send your domain.xml? Are you changing the
>>>>>> http-listener's acceptor-threads value by any chance? If
>>>>>> yes, set it to 1 (the default) and see if the problem
>>>>>>
>>> still happens.
>>>
>>>> I have not changed the domain.xml at all. The domain.xml
>>>>
>>> lists acceptor-threads = 1.
>>>
>>> OK then the default config run with only 5 threads:
>>>
>>> <request-processing ...thread-count="5".../>
>>>
>>> You might want to increase that number to see if it helps.
>>>
>>>
>>>
>>>
>>>>>> Also please let us know which JDK and OS version you are using.
>>>>>>
>>>>>>
>>>> We're using JDK 1.6 OS is Windows Server 2003.
>>>>
>>> Haaaaa....Window 2003 leaks nbpool and eventually the TCP
>>> stack will go
>>> down completely. We have reported that problem to Microsoft
>>> and as far
>>> as I can tell, no patch has been provided so far. So that's
>>> possibly the
>>> problem you are facing. Note that this is *not* a
>>> GlassFish/Java issues,
>>> but a win32/2003 issue. The workaround is to avoid using non
>>> blocking
>>> socket and instead use blocking. Add the following property
>>> in domain.xml:
>>>
>>> -Dcom.sun.enterprise.web.connector.useCoyoteConnector=true
>>>
>>> to see if that help.
>>>
>>> A+
>>>
>>> -- Jeanfrancois
>>>
>>>
>>>
>>>
>>>>>> Thanks
>>>>>>
>>>>>> -- Jeanfrancois
>>>>>>
>>>>>>
>>>> Thank you for the quick response.
>>>>
>>>> == Eric ==
>>>>
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>>
>>>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>>>> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>>> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>
>
>