users@glassfish.java.net

Re: Server stops responding due to Glassfish

From: Ryan de Laplante <ryan_at_ijws.com>
Date: Tue, 29 Apr 2008 22:53:20 -0400

When the client (browsers) would attempt to access a page, it would sit
in "Waiting for response from web server..." forever, and people would
just close the browser. That is probably why we see those errors.


Do you have a tool we can install on our test server to put the kind of
load you described on the system? I do not like the idea of doing that
on our production system. When you said it breaks, does the app server
not return any data when trying to access pages? Does restarting the
app server solve the problem, or do you have to reboot Windows? For us,
just restarting the service solves the problem. We only reboot once per
month when installing Windows updates.

I wasn't able to do a jstack PID once, so I doubt I can do it every two
hours for you. Also, it runs as a Windows service so I can't see the
console to interact with it.

I like your suggestion of disabling Grizzly from an other email:

-Dcom.sun.enterprise.web.useCoyoteConnector=true

I'm going to ask for permission to try this setting in production. I
did a full transaction on my development computer using this setting. I
don't know how to confirm that Coyote was running instead of Grizzly,
but I did see that JVM parameter in the startup log messages.

Were there any changes in UR1 or UR2 that you think would affect this?
We're using the FCS + a patch you gave me in November that would
eventually be released as part of UR1.


Thanks,
Ryan


Jeanfrancois Arcand wrote:
> Hi Ryan,
>
> thanks for the info...so far, the exception are expected. They just
> means the client closed the connection before the server has a chance
> to write a response:
>
>> Caused by: ClientAbortException:
>> java.nio.channels.ClosedChannelException
>> at
>> org.apache.coyote.tomcat5.OutputBuffer.realWriteBytes(OutputBuffer.java:409)
>>
>> at
>> org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:417)
>> at
>> org.apache.coyote.tomcat5.OutputBuffer.doFlush(OutputBuffer.java:357)
>> at
>> org.apache.coyote.tomcat5.OutputBuffer.flush(OutputBuffer.java:335)
>> at
>> org.apache.coyote.tomcat5.CoyoteResponse.flushBuffer(CoyoteResponse.java:638)
>>
>> at
>> org.apache.coyote.tomcat5.CoyoteResponseFacade.flushBuffer(CoyoteResponseFacade.java:291)
>>
>> at
>> com.sun.faces.application.ViewHandlerImpl.renderView(ViewHandlerImpl.java:203)
>>
>> at
>> com.sun.rave.web.ui.appbase.faces.ViewHandlerImpl.renderView(ViewHandlerImpl.java:320)
>>
>> at
>> com.sun.faces.lifecycle.RenderResponsePhase.execute(RenderResponsePhase.java:106)
>>
>> ... 34 more
>> Caused by: java.nio.channels.ClosedChannelException
>> at
>> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
>> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>> at
>> com.sun.enterprise.web.connector.grizzly.OutputWriter.flushChannel(OutputWriter.java:94)
>>
>> at
>> com.sun.enterprise.web.connector.grizzly.OutputWriter.flushChannel(OutputWriter.java:67)
>>
>> at
>> com.sun.enterprise.web.connector.grizzly.SocketChannelOutputBuffer.flushChannel(SocketChannelOutputBuffer.java:167)
>>
>> at
>> com.sun.enterprise.web.connector.grizzly.SocketChannelOutputBuffer.flushBuffer(SocketChannelOutputBuffer.java:202)
>>
>> at
>> com.sun.enterprise.web.connector.grizzly.SocketChannelOutputBuffer.flush(SocketChannelOutputBuffer.java:178)
>>
>> at
>> com.sun.enterprise.web.connector.grizzly.SocketChannelOutputBuffer.realWriteBytes(SocketChannelOutputBuffer.java:145)
>>
>> at
>> org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:851)
>>
>> at
>> org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:149)
>>
>> at
>> org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:626)
>>
>> at org.apache.coyote.Response.doWrite(Response.java:599)
>> at
>> org.apache.coyote.tomcat5.OutputBuffer.realWriteBytes(OutputBuffer.java:404)
>>
>> ... 42 more
>> |#]
>
> This exception isn't the cause of the hangs. Can you try something?
> Can you get a jstack every 2 hours to see how it goes? Also the nbpool
> problem will happens faster if more http requests are made. I'm able
> to reproduce it quite fast with a load of 300 users doing requests
> every seconds...its takes less than 2 hours to break win32 :-)
>
> Thanks
>
> -- Jeanfrancois
>
>
> Ryan de Laplante wrote:
>> It went down again today! 5.5 hours since it went down last. This is
>> a new record. It also has a comparatively low NP Pool count of 383K
>> (I've seen it up to 2200K before), and is using only 304,504K
>> memory. I forgot to try tweaking a setting in HTTP listener to see
>> if it comes back to life or not. I did try to do a stack dump:
>>
>> > jstack 5180
>> 5180: Not enough storage is available to process this command
>>
>> Then I tried using this tool to get a stack dump:
>>
>> http://www.adaptj.com/main/download
>>
>> 5180 java.exe session:0 threads:131 parent:5744
>> The current version does not support processes running in a different
>> session.
>> Try any of the following options:
>> 1) Run the StackTrace service in the same session with the target
>> process.
>> 2) Start the terminal client with "mstsc.exe /console"
>> 3) Use VNC from http://www.realvnc.com/ as a remote client.
>>
>> Attached are some Grizzly and NIO channels related exceptions from
>> server.log
>>
>> We've had to write a program that checks the server every 10 minutes
>> and email us when it goes down. We're also now going to restart
>> GlassFish three times a week. Based on the discussions on this
>> mailing list today about linux users having these same problems, we
>> are no longer convinced that it can be blamed on the Windows 2003 NP
>> Pool leak. Yes there is a leak, but I think GlassFish has a serious
>> problem too. We did not have this problem with JBoss on the same
>> server and OS a year ago.
>>
>> Hopefully Sun will put more resources into this issue immediately.
>> It is the only issue we've had to use our support contract for, and
>> we seem to be getting nowhere with it after 6 months. My employer is
>> not satisfied and I'm wondering if he will renew the contract, or
>> switch app server vendors. This is a production server and it goes
>> down all the time.
>>
>>
>> Ryan
>>
>>
>> Ryan de Laplante wrote:
>>> glassfish_at_javadesktop.org wrote:
>>>>> HTTP requests consistently stop reaching the web application
>>>>>
>>>>
>>>> Detect same on my server (linux), but not consistently and very
>>>> rarely.
>>>> In that cases non of my webapplications are reachable, also admin gui.
>>>> Nothing to see in log files.
>>>>
>>>> Think this must be an "unnormal" issue.
>>>> Not familiar with that stuff, just guessing: could it be an problem
>>>> with broken connections, mean if client/user aborts
>>>> [Message sent by forum member 'hammoud' (hammoud)]
>>>>
>>>> http://forums.java.net/jive/thread.jspa?messageID=272085
>>>>
>>> This is concerning. Up until now I thought this problem was
>>> specific to Windows 2003 NP Pool leak. That might explain why I
>>> experience two similar but different issues:
>>>
>>> 1) Every week or two the web container would stop serving requests.
>>> Sometimes it would say "Maximum connections reached: 4096" even when
>>> there were only a couple of hundred transactions a day. Other times
>>> it would show nothing in the browser or not respond at all. My
>>> other http listener for web services also stops working. Usually
>>> the web admin console is the only http listener that is
>>> working. Restarting the SJSAS 9.1 Windows service solves the
>>> problem.
>>>
>>> 2) Every few months I find that restarting SJSAS 9.1 Windows service
>>> makes no difference. PostgreSQL also dies and you can't connect to
>>> it anymore. The only solution is to reboot Windows.
>>>
>>> I think issue #2 is related to the Windows 2003 Server NP Pool leak
>>> which may have been fixed now with Microsoft patches, but I doubt it
>>> since we have to restart SJSAS 9.1 more often since installing the
>>> patches. I think issue #1 is a GlassFish problem, since you
>>> experience it on linux and so does an other poster in this forum.
>>>
>>>
>>> Ryan
>>
>>
>> ------------------------------------------------------------------------
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>
>