RE: Glassfish Slowing Down

From: Eric Chamberlain <echamberlain_at_ventripoint.com>
Date: Mon, 23 Jun 2008 10:09:43 -0700

We did some data mining on the testing logs for response times and found a very
interesting result: the GF slowdown is *not* gradual after all. Response times < 1 day
beforehand are in the normal range before they spike to 10x normal and require a GF
restart.

I am now wondering if there is some sort of coincident event that is causing GF to hit
this speed bump. Any ideas of what kind of event I should look for that would slow GF
down almost instantaneously?

Eric Chamberlain
VentriPoint, Inc. | www.ventripoint.com | Software Engineer
Helping heart care through innovative diagnostic solutions

>> -----Original Message-----
>> From: Ryan de Laplante [mailto:ryan_at_ijws.com]
>> Sent: Wednesday, June 18, 2008 10:31 AM
>> To: users_at_glassfish.dev.java.net
>> Subject: Re: Glassfish Slowing Down
>>
>> We've been living with NP Pool problem for a long time.
>> Here are what
>> I noticed about it:
>>
>> - Most of the time we are not affected by it. We can see NP
>> Pool count
>> is much higher in Task Manager for java.exe than other
>> programs, but it
>> stays around 400K - 500K and works fine.
>>
>> - Every 2-3 months Postgres will become unusably slow when
>> doing TCP/IP
>> operations like connecting, logging in, browsing tables etc.
>> even with
>> the pgAdmin III. GlassFish would also become nearly unusable.
>> Restarting Postgres and GlassFish did not solve the problem.
>> We had to
>> reboot.
>>
>> - We experienced weird problems with MS SQL Server 2005 JDBC driver
>> every few days. It seemed to have poor error handling of network
>> problems and would cause GlassFish to hang. While it was in
>> this state
>> you could watch the NP Pool count rise 1K every 15 seconds.
>> I've seen it
>> up close to 3000K in Task Manager. No other programs on the
>> computer
>> were affected, and restarting GlassFih solved the problem until next
>> time. NP Pool growing rapidly seemed to be directly
>> related to the
>> JDBC driver. Replacing it with jTDS driver solved that problem. I
>> don't attribute these lockups to the NP Pool leak.
>>
>>
>> So in my experience, the real effects of NP Pool bug only
>> surface every
>> 2-3 months and you can recognize it when the only way to fix
>> it is to
>> reboot Windows.
>>
>>
>> Ryan
>>
>>
>> Eric Chamberlain wrote:
>> > Jeanfrancois,
>> >
>> > Hmm. Your comments about Windows 2003 prompt some questions:
>> >
>> > 1. If Windows 2003 is leaking nbpool wouldn't that cause
>> the server to crash? We have had
>> > no crashes of the server, only Glassfish slowing down.
>> The server has been solid even
>> > with other active IIS services running on it.
>> > 2. If the problem was in the OS, then why does restarting
>> Glassfish clear the problem? Is
>> > the nbpool problem a *per-process* issue?
>> > 3. If I configure to use blocking rather than non-blocking
>> sockets, what would that do to
>> > my throughput?
>> >
>> >
>> > Eric Chamberlain
>> > VentriPoint, Inc. | www.ventripoint.com | Software Engineer
>> > Helping heart care through innovative diagnostic solutions
>> >
>> >
>> >
>> >>> -----Original Message-----
>> >>> From: Jeanfrancois.Arcand_at_Sun.COM
>> >>> [mailto:Jeanfrancois.Arcand_at_Sun.COM]
>> >>> Sent: Wednesday, June 18, 2008 7:14 AM
>> >>> To: users_at_glassfish.dev.java.net
>> >>> Subject: Re: Glassfish Slowing Down
>> >>>
>> >>> Salut,
>> >>>
>> >>> almost forget to reply....
>> >>>
>> >>> Eric Chamberlain wrote:
>> >>>
>> >>>> My comments are in-line below.
>> >>>>
>> >>>> Eric Chamberlain
>> >>>> VentriPoint, Inc. | www.ventripoint.com | Software Engineer
>> >>>> Helping heart care through innovative diagnostic solutions
>> >>>>
>> >>>>
>> >>>>
>> >>>>>> -----Original Message-----
>> >>>>>> From: Jeanfrancois.Arcand_at_Sun.COM
>> >>>>>> [mailto:Jeanfrancois.Arcand_at_Sun.COM]
>> >>>>>> Sent: Wednesday, June 11, 2008 5:46 PM
>> >>>>>> To: users_at_glassfish.dev.java.net
>> >>>>>> Subject: Re: Glassfish Slowing Down
>> >>>>>>
>> >>>>>> Salut,
>> >>>>>>
>> >>>>>> Eric Chamberlain wrote:
>> >>>>>>
>> >>>>>>>> Greetings all.
>> >>>>>>>>
>> >>>>>>>> I am seeing a problem in which my Glassfish-hosted
>> >>>>>>>>
>> >>>>>> service slows down over time. Over
>> >>>>>>
>> >>>>>>>> a few weeks, it slows down to 10x its original response
>> >>>>>>>>
>> >>>>>> time. The
>> >>>>>>
>> >>>>>>>> Glassfish framework is used only to support this one
>> >>>>>>>>
>> >>>>>> service (which,
>> >>>>>>
>> >>>>>>>> conveniently, has but one external API). When I stop the
>> >>>>>>>>
>> >>>>>> Glassfish
>> >>>>>>
>> >>>>>>>> instance and re-start it, the response time immediately
>> >>>>>>>>
>> >>>>>> goes back to its expected (short) interval.
>> >>>>>> Can you gives more information about your service? Mainly,
>> >>>>>> are you using only JSP and Servlet (no db, no remote call,
>> >>>>>> no extrenal component)?
>> >>>>>>
>> >>>> There is no JSP or servlet involved here. All we have is
>> >>>>
>> >>> a web service.
>> >>>
>> >>> OK they might be a problem there, but I doubt.
>> >>>
>> >>>
>> >>>
>> >>>>>> Most of the time this problem is cased by an external
>> >>>>>> component that locks our threads.
>> >>>>>>
>> >>>>>>
>> >>>> I don't know anything external that would do that. The
>> >>>>
>> >>> individual calls are short and
>> >>>
>> >>>> there is no database involved.
>> >>>>
>> >>>>
>> >>>>>>>> We have conducted stress testing on the service code and
>> >>>>>>>>
>> >>>>>> there is no
>> >>>>>>
>> >>>>>>>> slow down detectable even when we simulate many weeks
>> >>>>>>>>
>> >>> of use in a
>> >>>
>> >>>>>>>> short time. We also could detect no bloat of the heap in
>> >>>>>>>>
>> >>>>>> our stress
>> >>>>>>
>> >>>>>>>> testing. BTW, the service does *not* access any databases.
>> >>>>>>>>
>> >>>>>>>> The next suspect is the Glassfish framework itself. How
>> >>>>>>>>
>> >>>>>> can I find
>> >>>>>>
>> >>>>>>>> out more information on when and if Glassfish is
>> slowing down
>> >>>>>>>> handling requests? Is there someway that I can post
>> >>>>>>>>
>> >>> an automatic
>> >>>
>> >>>>>>>> monitor which will help me track down changes in response
>> >>>>>>>>
>> >>>>>> times over a multi-week time frame and correlate them with
>> >>>>>> memory usage?
>> >>>>>>
>> >>>>>> Are you able to reproduce the problem easily? Do you think
>> >>>>>> are you able to get a thread dump when it starts slowing down?
>> >>>>>>
>> >>>>>>
>> >>>> Reproducing the problem takes a while but within a two or
>> >>>>
>> >>> three weeks the problem is
>> >>>
>> >>>> perceptable. It is not possible to reproduce the problem
>> >>>>
>> >>> within a day. I probably could
>> >>>
>> >>>> see the change over a week if I tracked closely the
>> response times.
>> >>>>
>> >>> When it happens, are you able to do a jstack <PID> >
>> >>> dump.txt and send
>> >>> it here?
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>>>>>> Another hypothesis is that we're losing information from
>> >>>>>>>> dis-connection and re-connections that happen over time
>> >>>>>>>>
>> >>>>>> (and thus are not seen in a stress test because of
>> >>>>>>
>> >>>>>>>> the compressed time frame). Have there been any
>> >>>>>>>>
>> >>>>>> problems in this area found by others?
>> >>>>>>
>> >>>>>>>> I could find nothing myself.
>> >>>>>>>>
>> >>>>>> Grizzly (the http front end of GlassFish) will close
>> >>>>>> connections under two circumstances:
>> >>>>>>
>> >>>>>> (1) A connection is idle for more that 30 seconds
>> >>>>>> (2) More than 250 requests has been made on a persistent
>> >>>>>>
>> >>> connection.
>> >>>
>> >>>>>> Based on the above information, I suspect (1) might happens.
>> >>>>>> But usually slow down are observed because
>> >>>>>>
>> >>>>>> (1) All the worker thread takes times to execute. During
>> >>>>>> that time, all incoming requests are queued. As soon as one
>> >>>>>> thread is free, it dequeue one request and execute it. By
>> >>>>>> default, the queue can accept 4096 connection. After
>> >>>>>> reaching that limit, Grizzly just start refusing requests.
>> >>>>>>
>> >>>>>> (2) Your application/framework is doing something wrong by
>> >>>>>> caching/storing data (wild guess).
>> >>>>>>
>> >>>> I cannot reproduce the problem by stressing the app over
>> >>>>
>> >>> the short run. This leads me to
>> >>>
>> >>>> think it is not in the app.
>> >>>>
>> >>>>
>> >>>>>> So, first let's do the usual configuration stuff. First, can
>> >>>>>> you add, in domain.xml, the following property:
>> >>>>>>
>> >>>>>> <jvm-options>-Dcom.sun.enterprise.server.ss.ASQuickStartup=fa
>> >>>>>>
>> >>>> lse</jvm-options>
>> >>>>
>> >>>> I do not know what effect I should expect from this
>> >>>>
>> >>> change. Please explain.
>> >>>
>> >>> It is just disabling the following mechanism:
>> >>>
>> >>> http://weblogs.java.net/blog/binod/archive/2005/09/lazy_initi
>> >>> aliza.html
>> >>>
>> >>> shouldn't make a difference, but just to make sure.
>> >>>
>> >>>
>> >>>
>> >>>>>> Restart GlassFish and try to reproduce the problem. Can you
>> >>>>>> also send your domain.xml? Are you changing the
>> >>>>>> http-listener's acceptor-threads value by any chance? If
>> >>>>>> yes, set it to 1 (the default) and see if the problem
>> >>>>>>
>> >>> still happens.
>> >>>
>> >>>> I have not changed the domain.xml at all. The domain.xml
>> >>>>
>> >>> lists acceptor-threads = 1.
>> >>>
>> >>> OK then the default config run with only 5 threads:
>> >>>
>> >>> <request-processing ...thread-count="5".../>
>> >>>
>> >>> You might want to increase that number to see if it helps.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>>>> Also please let us know which JDK and OS version you
>> are using.
>> >>>>>>
>> >>>>>>
>> >>>> We're using JDK 1.6 OS is Windows Server 2003.
>> >>>>
>> >>> Haaaaa....Window 2003 leaks nbpool and eventually the TCP
>> >>> stack will go
>> >>> down completely. We have reported that problem to Microsoft
>> >>> and as far
>> >>> as I can tell, no patch has been provided so far. So that's
>> >>> possibly the
>> >>> problem you are facing. Note that this is *not* a
>> >>> GlassFish/Java issues,
>> >>> but a win32/2003 issue. The workaround is to avoid using non
>> >>> blocking
>> >>> socket and instead use blocking. Add the following property
>> >>> in domain.xml:
>> >>>
>> >>> -Dcom.sun.enterprise.web.connector.useCoyoteConnector=true
>> >>>
>> >>> to see if that help.
>> >>>
>> >>> A+
>> >>>
>> >>> -- Jeanfrancois
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>>>> Thanks
>> >>>>>>
>> >>>>>> -- Jeanfrancois
>> >>>>>>
>> >>>>>>
>> >>>> Thank you for the quick response.
>> >>>>
>> >>>> == Eric ==
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> ---------------------------------------------------------------------
>> >>>
>> >>>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>> >>>> For additional commands, e-mail:
>> users-help_at_glassfish.dev.java.net
>> >>>>
>> >>>>
>> >>>
>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>> >>> For additional commands, e-mail:
>> users-help_at_glassfish.dev.java.net
>> >>>
>> >>>
>> >
>> >
>> >
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>> > For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>> >
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>>