users@glassfish.java.net

Re: Glassfish Slowing Down

From: Jeanfrancois Arcand <Jeanfrancois.Arcand_at_Sun.COM>
Date: Mon, 23 Jun 2008 13:28:23 -0400

Salut,

Eric Chamberlain wrote:
> We did some data mining on the testing logs for response times and found a very
> interesting result: the GF slowdown is *not* gradual after all. Response times < 1 day
> beforehand are in the normal range before they spike to 10x normal and require a GF
> restart.

Can you do a jstack PID when that happens? So far we have seems some
issue with database pool, and slowdown usually happens because Grizzly
starts queuing requests, waiting for his WorherThread to complete their
work, If all WorkerThreads are blocked/synchronized on a database
operations, then all upcoming request are queue and this is usually hwne
you see a slowdown.

Thanks

-- Jeanfrancois



>
> I am now wondering if there is some sort of coincident event that is causing GF to hit
> this speed bump. Any ideas of what kind of event I should look for that would slow GF
> down almost instantaneously?
>
> Eric Chamberlain
> VentriPoint, Inc. | www.ventripoint.com | Software Engineer
> Helping heart care through innovative diagnostic solutions
>
>
>>> -----Original Message-----
>>> From: Ryan de Laplante [mailto:ryan_at_ijws.com]
>>> Sent: Wednesday, June 18, 2008 10:31 AM
>>> To: users_at_glassfish.dev.java.net
>>> Subject: Re: Glassfish Slowing Down
>>>
>>> We've been living with NP Pool problem for a long time.
>>> Here are what
>>> I noticed about it:
>>>
>>> - Most of the time we are not affected by it. We can see NP
>>> Pool count
>>> is much higher in Task Manager for java.exe than other
>>> programs, but it
>>> stays around 400K - 500K and works fine.
>>>
>>> - Every 2-3 months Postgres will become unusably slow when
>>> doing TCP/IP
>>> operations like connecting, logging in, browsing tables etc.
>>> even with
>>> the pgAdmin III. GlassFish would also become nearly unusable.
>>> Restarting Postgres and GlassFish did not solve the problem.
>>> We had to
>>> reboot.
>>>
>>> - We experienced weird problems with MS SQL Server 2005 JDBC driver
>>> every few days. It seemed to have poor error handling of network
>>> problems and would cause GlassFish to hang. While it was in
>>> this state
>>> you could watch the NP Pool count rise 1K every 15 seconds.
>>> I've seen it
>>> up close to 3000K in Task Manager. No other programs on the
>>> computer
>>> were affected, and restarting GlassFih solved the problem until next
>>> time. NP Pool growing rapidly seemed to be directly
>>> related to the
>>> JDBC driver. Replacing it with jTDS driver solved that problem. I
>>> don't attribute these lockups to the NP Pool leak.
>>>
>>>
>>> So in my experience, the real effects of NP Pool bug only
>>> surface every
>>> 2-3 months and you can recognize it when the only way to fix
>>> it is to
>>> reboot Windows.
>>>
>>>
>>> Ryan
>>>
>>>
>>> Eric Chamberlain wrote:
>>>> Jeanfrancois,
>>>>
>>>> Hmm. Your comments about Windows 2003 prompt some questions:
>>>>
>>>> 1. If Windows 2003 is leaking nbpool wouldn't that cause
>>> the server to crash? We have had
>>>> no crashes of the server, only Glassfish slowing down.
>>> The server has been solid even
>>>> with other active IIS services running on it.
>>>> 2. If the problem was in the OS, then why does restarting
>>> Glassfish clear the problem? Is
>>>> the nbpool problem a *per-process* issue?
>>>> 3. If I configure to use blocking rather than non-blocking
>>> sockets, what would that do to
>>>> my throughput?
>>>>
>>>>
>>>> Eric Chamberlain
>>>> VentriPoint, Inc. | www.ventripoint.com | Software Engineer
>>>> Helping heart care through innovative diagnostic solutions
>>>>
>>>>
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Jeanfrancois.Arcand_at_Sun.COM
>>>>>> [mailto:Jeanfrancois.Arcand_at_Sun.COM]
>>>>>> Sent: Wednesday, June 18, 2008 7:14 AM
>>>>>> To: users_at_glassfish.dev.java.net
>>>>>> Subject: Re: Glassfish Slowing Down
>>>>>>
>>>>>> Salut,
>>>>>>
>>>>>> almost forget to reply....
>>>>>>
>>>>>> Eric Chamberlain wrote:
>>>>>>
>>>>>>> My comments are in-line below.
>>>>>>>
>>>>>>> Eric Chamberlain
>>>>>>> VentriPoint, Inc. | www.ventripoint.com | Software Engineer
>>>>>>> Helping heart care through innovative diagnostic solutions
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Jeanfrancois.Arcand_at_Sun.COM
>>>>>>>>> [mailto:Jeanfrancois.Arcand_at_Sun.COM]
>>>>>>>>> Sent: Wednesday, June 11, 2008 5:46 PM
>>>>>>>>> To: users_at_glassfish.dev.java.net
>>>>>>>>> Subject: Re: Glassfish Slowing Down
>>>>>>>>>
>>>>>>>>> Salut,
>>>>>>>>>
>>>>>>>>> Eric Chamberlain wrote:
>>>>>>>>>
>>>>>>>>>>> Greetings all.
>>>>>>>>>>>
>>>>>>>>>>> I am seeing a problem in which my Glassfish-hosted
>>>>>>>>>>>
>>>>>>>>> service slows down over time. Over
>>>>>>>>>
>>>>>>>>>>> a few weeks, it slows down to 10x its original response
>>>>>>>>>>>
>>>>>>>>> time. The
>>>>>>>>>
>>>>>>>>>>> Glassfish framework is used only to support this one
>>>>>>>>>>>
>>>>>>>>> service (which,
>>>>>>>>>
>>>>>>>>>>> conveniently, has but one external API). When I stop the
>>>>>>>>>>>
>>>>>>>>> Glassfish
>>>>>>>>>
>>>>>>>>>>> instance and re-start it, the response time immediately
>>>>>>>>>>>
>>>>>>>>> goes back to its expected (short) interval.
>>>>>>>>> Can you gives more information about your service? Mainly,
>>>>>>>>> are you using only JSP and Servlet (no db, no remote call,
>>>>>>>>> no extrenal component)?
>>>>>>>>>
>>>>>>> There is no JSP or servlet involved here. All we have is
>>>>>>>
>>>>>> a web service.
>>>>>>
>>>>>> OK they might be a problem there, but I doubt.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>> Most of the time this problem is cased by an external
>>>>>>>>> component that locks our threads.
>>>>>>>>>
>>>>>>>>>
>>>>>>> I don't know anything external that would do that. The
>>>>>>>
>>>>>> individual calls are short and
>>>>>>
>>>>>>> there is no database involved.
>>>>>>>
>>>>>>>
>>>>>>>>>>> We have conducted stress testing on the service code and
>>>>>>>>>>>
>>>>>>>>> there is no
>>>>>>>>>
>>>>>>>>>>> slow down detectable even when we simulate many weeks
>>>>>>>>>>>
>>>>>> of use in a
>>>>>>
>>>>>>>>>>> short time. We also could detect no bloat of the heap in
>>>>>>>>>>>
>>>>>>>>> our stress
>>>>>>>>>
>>>>>>>>>>> testing. BTW, the service does *not* access any databases.
>>>>>>>>>>>
>>>>>>>>>>> The next suspect is the Glassfish framework itself. How
>>>>>>>>>>>
>>>>>>>>> can I find
>>>>>>>>>
>>>>>>>>>>> out more information on when and if Glassfish is
>>> slowing down
>>>>>>>>>>> handling requests? Is there someway that I can post
>>>>>>>>>>>
>>>>>> an automatic
>>>>>>
>>>>>>>>>>> monitor which will help me track down changes in response
>>>>>>>>>>>
>>>>>>>>> times over a multi-week time frame and correlate them with
>>>>>>>>> memory usage?
>>>>>>>>>
>>>>>>>>> Are you able to reproduce the problem easily? Do you think
>>>>>>>>> are you able to get a thread dump when it starts slowing down?
>>>>>>>>>
>>>>>>>>>
>>>>>>> Reproducing the problem takes a while but within a two or
>>>>>>>
>>>>>> three weeks the problem is
>>>>>>
>>>>>>> perceptable. It is not possible to reproduce the problem
>>>>>>>
>>>>>> within a day. I probably could
>>>>>>
>>>>>>> see the change over a week if I tracked closely the
>>> response times.
>>>>>>>
>>>>>> When it happens, are you able to do a jstack <PID> >
>>>>>> dump.txt and send
>>>>>> it here?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>>>> Another hypothesis is that we're losing information from
>>>>>>>>>>> dis-connection and re-connections that happen over time
>>>>>>>>>>>
>>>>>>>>> (and thus are not seen in a stress test because of
>>>>>>>>>
>>>>>>>>>>> the compressed time frame). Have there been any
>>>>>>>>>>>
>>>>>>>>> problems in this area found by others?
>>>>>>>>>
>>>>>>>>>>> I could find nothing myself.
>>>>>>>>>>>
>>>>>>>>> Grizzly (the http front end of GlassFish) will close
>>>>>>>>> connections under two circumstances:
>>>>>>>>>
>>>>>>>>> (1) A connection is idle for more that 30 seconds
>>>>>>>>> (2) More than 250 requests has been made on a persistent
>>>>>>>>>
>>>>>> connection.
>>>>>>
>>>>>>>>> Based on the above information, I suspect (1) might happens.
>>>>>>>>> But usually slow down are observed because
>>>>>>>>>
>>>>>>>>> (1) All the worker thread takes times to execute. During
>>>>>>>>> that time, all incoming requests are queued. As soon as one
>>>>>>>>> thread is free, it dequeue one request and execute it. By
>>>>>>>>> default, the queue can accept 4096 connection. After
>>>>>>>>> reaching that limit, Grizzly just start refusing requests.
>>>>>>>>>
>>>>>>>>> (2) Your application/framework is doing something wrong by
>>>>>>>>> caching/storing data (wild guess).
>>>>>>>>>
>>>>>>> I cannot reproduce the problem by stressing the app over
>>>>>>>
>>>>>> the short run. This leads me to
>>>>>>
>>>>>>> think it is not in the app.
>>>>>>>
>>>>>>>
>>>>>>>>> So, first let's do the usual configuration stuff. First, can
>>>>>>>>> you add, in domain.xml, the following property:
>>>>>>>>>
>>>>>>>>> <jvm-options>-Dcom.sun.enterprise.server.ss.ASQuickStartup=fa
>>>>>>>>>
>>>>>>> lse</jvm-options>
>>>>>>>
>>>>>>> I do not know what effect I should expect from this
>>>>>>>
>>>>>> change. Please explain.
>>>>>>
>>>>>> It is just disabling the following mechanism:
>>>>>>
>>>>>> http://weblogs.java.net/blog/binod/archive/2005/09/lazy_initi
>>>>>> aliza.html
>>>>>>
>>>>>> shouldn't make a difference, but just to make sure.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>> Restart GlassFish and try to reproduce the problem. Can you
>>>>>>>>> also send your domain.xml? Are you changing the
>>>>>>>>> http-listener's acceptor-threads value by any chance? If
>>>>>>>>> yes, set it to 1 (the default) and see if the problem
>>>>>>>>>
>>>>>> still happens.
>>>>>>
>>>>>>> I have not changed the domain.xml at all. The domain.xml
>>>>>>>
>>>>>> lists acceptor-threads = 1.
>>>>>>
>>>>>> OK then the default config run with only 5 threads:
>>>>>>
>>>>>> <request-processing ...thread-count="5".../>
>>>>>>
>>>>>> You might want to increase that number to see if it helps.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>> Also please let us know which JDK and OS version you
>>> are using.
>>>>>>>>>
>>>>>>> We're using JDK 1.6 OS is Windows Server 2003.
>>>>>>>
>>>>>> Haaaaa....Window 2003 leaks nbpool and eventually the TCP
>>>>>> stack will go
>>>>>> down completely. We have reported that problem to Microsoft
>>>>>> and as far
>>>>>> as I can tell, no patch has been provided so far. So that's
>>>>>> possibly the
>>>>>> problem you are facing. Note that this is *not* a
>>>>>> GlassFish/Java issues,
>>>>>> but a win32/2003 issue. The workaround is to avoid using non
>>>>>> blocking
>>>>>> socket and instead use blocking. Add the following property
>>>>>> in domain.xml:
>>>>>>
>>>>>> -Dcom.sun.enterprise.web.connector.useCoyoteConnector=true
>>>>>>
>>>>>> to see if that help.
>>>>>>
>>>>>> A+
>>>>>>
>>>>>> -- Jeanfrancois
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> -- Jeanfrancois
>>>>>>>>>
>>>>>>>>>
>>>>>>> Thank you for the quick response.
>>>>>>>
>>>>>>> == Eric ==
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>> ---------------------------------------------------------------------
>>>>>>
>>>>>>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>>>>>>> For additional commands, e-mail:
>>> users-help_at_glassfish.dev.java.net
>>>>>>>
>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>>>>>> For additional commands, e-mail:
>>> users-help_at_glassfish.dev.java.net
>>>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>>>> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>>>>
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>>> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>