Hi Dennis,
> (HP-UX does suffer from the linux selector spin problem, but was not
> directly related to the CPU spikes so we are not persuing the
> workaround as
> yet because the channel close was causing long delays, stopping all
> activity. Whether or not this is a symptom of the other problem, I
> don't
> know).
I see.
> Our contacts at HP looked at some system info while the spike was
> occurring
> and said that one of the Grizzly worker threads was going "out of
> control"
> on socket reads, reading a massive amount of data, and subsequently
> chewing
> up CPU and thrashing in GC. I've looked at the code and if I
> understand it
> correctly Grizzly reads all of the data on the socket, stashes that
> in the
> input buffer (byte buffer), then starts to parse the input.
Right, but the data amount shouldn't be so "massive"... normally up-to
8K.
> Apparently the spike happened because throughout the day Grizzly was
> getting
> hammered with lots of data that was not HTTP. Almost like a denial of
> service attack, but we don't know how re-routing the network traffic
> "fixed"
> this. I see that really all we get when the exception is logged is
> the stack
> trace.
Interesting. The exception you saw in log is the one you've sent in on
of the prev. emails, right?
Is it possible for you to reproduce the issue? Can we try some patches
I can provide?
> What I was wondering is how difficult would it be to capture more
> information, such as the IP address of the sender, the amount of
> data that
> was read, and perhaps a hex dump of the first 100 bytes of data in
> the log?
Not exactly, but we can set logger level for "grizzly" to FINE/FINER/
FINEST to get more details.
And set -Dcom.sun.grizzly.enableSnoop=true
> Again, thanks for all of your support. Grizzly rocks!
Thank you!
WBR,
Alexey.
>
>
> ddooley wrote:
>>
>> Hi Alexey,
>>
>> We tried to implement the selector spin workaround by adding "hp-
>> ux" to
>> the isLinux property. The results
>> were indeed different...but unfortunately not better. Below is a
>> snippet
>> of the thread dump that shows a deadlock condition when attempting to
>> create the new selector. As you will see, Thread 5 is deadlocked on
>> Worker Thread 4 (object id 6e0e83d8).
>>
>> This test was run with 1.9.18e code base, as our management is not
>> comfortable with running the 19 snapshot code on a live client's
>> system.
>>
>> We are very confused by this...all tests in our QA lab with similar
>> machines (although to be honest the client's is somewhat
>> underpowered with
>> only two CPUs and 14 gigs of memory).
>>
>> Again, just to refresh memories, without the above code change the
>> client
>> does appear to suffer from the Selector spin problem (CPU spikes
>> and GC
>> thrashing). Usually once it occurs it happens almost like
>> clockwork every
>> hour. If we run with the selector spin workaround set then instead
>> of the
>> spikes happening we get the deadlock condition at exactly the point
>> where
>> the CPU spikes would occur.
>>
>> I'm inclined to believe that Grizzly is not the problem here, at
>> least not
>> directly. I wish I understood NIO better. My thought is that
>> there is a
>> horrible bug in HP's VM (or OS) that only manifests itself on a
>> system of
>> this configuration. Its as if the system is not closing the socket
>> in a
>> timely manner. From what the client tells us, it can last up to ten
>> minutes before finally clearing up (when not using the selector spin
>> workaround).
>>
>> At this point, we would like to try the
>> executePendingIOUsingSelectorThread, but management is not willing
>> to put
>> the 19 snapshot code on a live client's system. How difficult
>> would it be
>> to get the changed sources between 1.9.18e and 19 snapshot to just
>> implement your original suggestion of
>> "executePendingIOUsingSelectorThread"?
>>
>> Thanks again for all your help!
>>
>> --Dennis
>>
>>
>> Full thread dump [Wed Dec 23 10:28:33 EST 2009] (Java HotSpot(TM)
>> Server
>> VM 1.5.0.18 jinteg:11.04.09-01:30 PA2.0 (aCC_AP) mixed mode):
>>
>> "Grizzly-30000-WorkerThread(4)" daemon prio=10 tid=0099c8e0 nid=22
>> lwp_id=3164606 runnable [53cee000..53cee778]
>> at sun.nio.ch.FileDispatcher.preClose0(Native Method)
>> at
>> sun.nio.ch.SocketDispatcher.preClose(SocketDispatcher.java:41)
>> at
>> sun
>> .nio
>> .ch
>> .SocketChannelImpl
>> .implCloseSelectableChannel(SocketChannelImpl.java:708)
>> - locked <6e0e83d8> (a java.lang.Object)
>> at
>> java
>> .nio
>> .channels
>> .spi
>> .AbstractSelectableChannel
>> .implCloseChannel(AbstractSelectableChannel.java:201)
>> at
>> java
>> .nio
>> .channels
>> .spi
>> .AbstractInterruptibleChannel
>> .close(AbstractInterruptibleChannel.java:97)
>> - locked <6e0e83b0> (a java.lang.Object)
>> at sun.nio.ch.SocketAdaptor.close(SocketAdaptor.java:352)
>> at
>> com
>> .sun
>> .grizzly.TCPSelectorHandler.closeChannel(TCPSelectorHandler.java:
>> 1356)
>> at
>> com
>> .sun
>> .grizzly
>> .BaseSelectionKeyHandler
>> .doAfterKeyCancel(BaseSelectionKeyHandler.java:229)
>> at
>> com
>> .sun
>> .grizzly
>> .BaseSelectionKeyHandler.cancel(BaseSelectionKeyHandler.java:216)
>> at
>> com
>> .sun
>> .grizzly
>> .http.SelectorThreadKeyHandler.cancel(SelectorThreadKeyHandler.java:
>> 80)
>> at
>> com.sun.grizzly.filter.ReadFilter.postExecute(ReadFilter.java:287)
>> at
>> com
>> .sun
>> .grizzly
>> .DefaultProtocolChain
>> .postExecuteProtocolFilter(DefaultProtocolChain.java:164)
>> at
>> com
>> .sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:
>> 103)
>> at
>> com
>> .sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:
>> 88)
>> at
>> com
>> .sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:
>> 76)
>> at
>> com
>> .sun
>> .grizzly
>> .ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:53)
>> at
>> com
>> .sun
>> .grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:
>> 57)
>> at com.sun.grizzly.ContextTask.run(ContextTask.java:69)
>> at
>> com.sun.grizzly.util.AbstractThreadPool
>> $Worker.doWork(AbstractThreadPool.java:330)
>> at
>> com.sun.grizzly.util.AbstractThreadPool
>> $Worker.run(AbstractThreadPool.java:309)
>> at java.lang.Thread.run(Thread.java:595)
>>
>> <other worker threads follow in wait states...>
>>
>> "Thread-5" daemon prio=10 tid=01660018 nid=16 lwp_id=3164591
>> waiting for
>> monitor entry [53ff4000..53ff4678]
>> at sun.nio.ch.SocketChannelImpl.kill(SocketChannelImpl.java:
>> 741)
>> - waiting to lock <6e0e83d8> (a java.lang.Object)
>> at
>> sun
>> .nio
>> .ch
>> .AbstractPollSelectorImpl.implClose(AbstractPollSelectorImpl.java:71)
>> at
>> sun.nio.ch.SelectorImpl.implCloseSelector(SelectorImpl.java:96)
>> - locked <6ea10360> (a sun.nio.ch.Util$1)
>> - locked <6ea10350> (a java.util.Collections$UnmodifiableSet)
>> - locked <6ea10158> (a sun.nio.ch.PollSelectorImpl)
>> at
>> java.nio.channels.spi.AbstractSelector.close(AbstractSelector.java:
>> 91)
>> at
>> com
>> .sun
>> .grizzly
>> .SelectorHandlerRunner
>> .switchToNewSelector(SelectorHandlerRunner.java:519)
>> at
>> com
>> .sun
>> .grizzly
>> .TCPSelectorHandler.workaroundSelectorSpin(TCPSelectorHandler.java:
>> 1520)
>> - locked <58bee570> (a java.lang.Object)
>> at
>> com
>> .sun
>> .grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:
>> 202)
>> at
>> com
>> .sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:
>> 130)
>> at
>> java.util.concurrent.ThreadPoolExecutor
>> $Worker.runTask(ThreadPoolExecutor.java:651)
>> at
>> java.util.concurrent.ThreadPoolExecutor
>> $Worker.run(ThreadPoolExecutor.java:676)
>> at java.lang.Thread.run(Thread.java:595)
>>
>>
>> Oleksiy Stashok wrote:
>>>
>>> Hi Dennis,
>>>
>>> yes, it looks like it could be the spin issue.
>>> But for sure it should be checked on original configuration, where
>>> you
>>> observed the issue.
>>> If it will not help - please try my original guess with
>>> "executePendingIOUsingSelectorThread" property.
>>>
>>> Thanks.
>>>
>>> WBR,
>>> Alexey.
>>>
>>>
>>> On Dec 18, 2009, at 17:22 , ddooley wrote:
>>>
>>>>
>>>> Hi Alexey,
>>>>
>>>> After treating HP-UX like Linux in the test for isLinux in
>>>> Controller.java
>>>> (in the 1.9.19 snapshot code base), our QA engineer ran a test that
>>>> made a
>>>> total of 85 requests from 10 different threads (monitored by the
>>>> profiler).
>>>>
>>>> This resulted in 12146 calls to getSpinRate(), 69 calls to
>>>> resetSpinCounter(), and 20 calls to workaroundSelectorSpin() over
>>>> the course
>>>> of 55 seconds.
>>>>
>>>> Based on the result, in your opinion, would this classify as the
>>>> Linux
>>>> selector spin issue? Or would this be considered normal?
>>>> Unfortunately for
>>>> us, we don't have another system to run the test on. Our
>>>> application only
>>>> runs on the HP-UX system.
>>>>
>>>> If there is any more detail you would like, please let me know.
>>>> This test
>>>> was run on a system that does not exhibit the CPU spikes that
>>>> prompted the
>>>> original posting. We are setting up another test on a system that
>>>> is more
>>>> like the client experiencing the CPU spikes to see if the results
>>>> are any
>>>> different.
>>>>
>>>> Thanks for your help. I really appreciate it.
>>>>
>>>> --Dennis
>>>>
>>>>
>>>> Oleksiy Stashok wrote:
>>>>>
>>>>> Hi Dennis,
>>>>>
>>>>>> the System.getProperty("os.name") is "HP-UX". Our contact at HP
>>>>>> states that
>>>>>> the selector spin test that fails (see link below) is a sun
>>>>>> bug, and
>>>>>> will be
>>>>>> fixed in 1.7 which makes me believe all the more the issues are
>>>>>> related. I
>>>>>> wonder if HP-UX and Linux share the same sun code base?
>>>>>>
>>>>>> Just for fun, I have modified the grizzly Controller class to
>>>>>> treat
>>>>>> "HP-UX"
>>>>>> the same as "linux" and we will be running some tests. I'll
>>>>>> let you
>>>>>> know
>>>>>> the results.
>>>>> Great, it should be my next suggestion :)))
>>>>> Please let me know the result.
>>>>>
>>>>> WBR,
>>>>> Alexey.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
>>>>> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/Grizzly-Web-Server-on-HP-UX-PA-RISC-Performance-Issue-tp26733514p26845878.html
>>>> Sent from the Grizzly - Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
>>>> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
>>> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>>>
>>>
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Grizzly-Web-Server-on-HP-UX-PA-RISC-Performance-Issue-tp26733514p27067602.html
> Sent from the Grizzly - Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>