The image that you attached doesn't look like you have a memory leak at
all: each of your collections is bringing you from ~700MB to ~250MB, and
you're doing that less than once an hour. A memory leak would show up as
the average heap used after a garbage collection (the drops in average
heap used) climbing over time until it reached the total heap size and
your servers started spending all of their time collecting garbage
(garbage collections every few seconds with no noticeable drop in heap
size) and performance died.
Your comment about the heavily used application slowing down while the
others remain responsive would seem to point to an issue specific to
that application, rather than a general cluster problem. Increasing the
acceptor threads is probably a good thing, you might also look into the
number of worker threads you are using, it should be in domain.xml as
something like
|<request-processing header-buffer-length-in-bytes="8192" initial-thread-count="10"
request-timeout-in-seconds="30" thread-count="130" thread-increment="10"/>|
In general, thread-count should be around the peak number of concurrent
requests that you expect to be serving. If your app is running out of
worker threads (because it's unable to complete requests as fast as they
are coming in, for example), you could very well see a performance issue
like that as requests pile up in the queue. More tips on making grizzly
perform better can be found on Jean-Francois's blog at
http://weblogs.java.net/blog/jfarcand/archive/2007/03/configuring_gri_2.html
glassfish_at_javadesktop.org wrote:
> Thanks for the quick reply. The situation is a bit complicated but if you have time to review it I would appreciate it.
>
> Environment--
> Sun Java System Application Server 9.1_02
> 2 physical machines
> 4 Applications deployed to 4 different clusters. One is heavily used, one is moderately used, and the other two are minimally used.
> 2 Sun Web Server 6 for load balancing. (Soon to be removed from environment.)
> https passthrough enabled.
> F5 big-ip in Front of Web Servers, for load balancing to the Web Servers. (Will load balance the applications directly soon.)
>
> The heavily used application will usually slows down in the mornings, M-F, until it becomes unresponsive, usually there is just a blank white page and the browser keeps working on the requests. (On the weekends the application usage is cut in half and we do not see the problem.) The other applications don't appear to be affected. If I go to the application directly (through a proxy the fronts our DMZ) on the http listener then there is no problem with the application. Unfortunately our proxy didn't allow https on the non-standard ports the applications were using. I have since changed this so I can check if the https listener is having problems when users say the application is slow. The logs don't show anything that would point to a root cause from what I can see. When I shutdown the cluster instances there are errors but the instances do shutdown. I don't know if that is normal as I have not watched the logs while shutting them down in the past.
>
> Initially I thought it was a problem with the web server. I was able to increase the performance significantly by changing these settings on the web server--
> upped the Acceptor Threads from 1 to 4.
> increased the Max Queue length 8182 (I don't think that mattered now as the peak never reached very high.)
> increased the RqThrottle to 512.
>
> But improving the performance on the web server didn't fix the problem. There were some application code changes which didn't help either. The last change I made was changing the http "engine" from grizzly to coyote. This was done last Friday. There have not been any confirmed problems since then. (This morning there was one site that said the application was slow again but I wasn't able to duplicate the problem. When asked for more details they said the performance issues went away.)
>
> So hopefully by changing the http listener or engine (what is the proper term?) to coyote fixed the problem. But since there have been no errors in the logs that would say were the problem is I am still looking into other possible explanations as to why this is/was happening.
>
> That is why I am thinking it is a Memory Leak in the App Server. The other possibility I am thinking of is a JDBC connection leak. I am looking into enabling monitoring it since it doesn't appear to be enabled by default.
>
> Anyway, if you have read until here then thank you for your time.
> [Message sent by forum member 'jfaldmo' (jfaldmo)]
>
> http://forums.java.net/jive/thread.jspa?messageID=324113
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>
>