Hi Alexey,
We tried to implement the selector spin workaround by adding "hp-ux" to the
isLinux property. The results
were indeed different...but unfortunately not better. Below is a snippet of
the thread dump that shows a deadlock condition when attempting to create
the new selector. As you will see, Thread 5 is deadlocked on Worker Thread
4 (object id 6e0e83d8).
This test was run with 1.9.18e code base, as our management is not
comfortable with running the 19 snapshot code on a live client's system.
We are very confused by this...all tests in our QA lab with similar machines
(although to be honest the client's is somewhat underpowered with only two
CPUs and 14 gigs of memory).
Again, just to refresh memories, without the above code change the client
does appear to suffer from the Selector spin problem (CPU spikes and GC
thrashing). Usually once it occurs it happens almost like clockwork every
hour. If we run with the selector spin workaround set then instead of the
spikes happening we get the deadlock condition at exactly the point where
the CPU spikes would occur.
I'm inclined to believe that Grizzly is not the problem here, at least not
directly. I wish I understood NIO better. My thought is that there is a
horrible bug in HP's VM (or OS) that only manifests itself on a system of
this configuration. Its as if the system is not closing the socket in a
timely manner. From what the client tells us, it can last up to ten minutes
before finally clearing up (when not using the selector spin workaround).
At this point, we would like to try the executePendingIOUsingSelectorThread,
but management is not willing to put the 19 snapshot code on a live client's
system. How difficult would it be to get the changed sources between
1.9.18e and 19 snapshot to just implement your original suggestion of
"executePendingIOUsingSelectorThread"?
Thanks again for all your help!
--Dennis
Full thread dump [Wed Dec 23 10:28:33 EST 2009] (Java HotSpot(TM) Server VM
1.5.0.18 jinteg:11.04.09-01:30 PA2.0 (aCC_AP) mixed mode):
"Grizzly-30000-WorkerThread(4)" daemon prio=10 tid=0099c8e0 nid=22
lwp_id=3164606 runnable [53cee000..53cee778]
at sun.nio.ch.FileDispatcher.preClose0(Native Method)
at sun.nio.ch.SocketDispatcher.preClose(SocketDispatcher.java:41)
at
sun.nio.ch.SocketChannelImpl.implCloseSelectableChannel(SocketChannelImpl.java:708)
- locked <6e0e83d8> (a java.lang.Object)
at
java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:201)
at
java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97)
- locked <6e0e83b0> (a java.lang.Object)
at sun.nio.ch.SocketAdaptor.close(SocketAdaptor.java:352)
at
com.sun.grizzly.TCPSelectorHandler.closeChannel(TCPSelectorHandler.java:1356)
at
com.sun.grizzly.BaseSelectionKeyHandler.doAfterKeyCancel(BaseSelectionKeyHandler.java:229)
at
com.sun.grizzly.BaseSelectionKeyHandler.cancel(BaseSelectionKeyHandler.java:216)
at
com.sun.grizzly.http.SelectorThreadKeyHandler.cancel(SelectorThreadKeyHandler.java:80)
at
com.sun.grizzly.filter.ReadFilter.postExecute(ReadFilter.java:287)
at
com.sun.grizzly.DefaultProtocolChain.postExecuteProtocolFilter(DefaultProtocolChain.java:164)
at
com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:103)
at
com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:88)
at
com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:76)
at
com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:53)
at
com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:57)
at com.sun.grizzly.ContextTask.run(ContextTask.java:69)
at
com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:330)
at
com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:309)
at java.lang.Thread.run(Thread.java:595)
<other worker threads follow in wait states...>
"Thread-5" daemon prio=10 tid=01660018 nid=16 lwp_id=3164591 waiting for
monitor entry [53ff4000..53ff4678]
at sun.nio.ch.SocketChannelImpl.kill(SocketChannelImpl.java:741)
- waiting to lock <6e0e83d8> (a java.lang.Object)
at
sun.nio.ch.AbstractPollSelectorImpl.implClose(AbstractPollSelectorImpl.java:71)
at sun.nio.ch.SelectorImpl.implCloseSelector(SelectorImpl.java:96)
- locked <6ea10360> (a sun.nio.ch.Util$1)
- locked <6ea10350> (a java.util.Collections$UnmodifiableSet)
- locked <6ea10158> (a sun.nio.ch.PollSelectorImpl)
at
java.nio.channels.spi.AbstractSelector.close(AbstractSelector.java:91)
at
com.sun.grizzly.SelectorHandlerRunner.switchToNewSelector(SelectorHandlerRunner.java:519)
at
com.sun.grizzly.TCPSelectorHandler.workaroundSelectorSpin(TCPSelectorHandler.java:1520)
- locked <58bee570> (a java.lang.Object)
at
com.sun.grizzly.SelectorHandlerRunner.doSelect(SelectorHandlerRunner.java:202)
at
com.sun.grizzly.SelectorHandlerRunner.run(SelectorHandlerRunner.java:130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
at java.lang.Thread.run(Thread.java:595)
Oleksiy Stashok wrote:
>
> Hi Dennis,
>
> yes, it looks like it could be the spin issue.
> But for sure it should be checked on original configuration, where you
> observed the issue.
> If it will not help - please try my original guess with
> "executePendingIOUsingSelectorThread" property.
>
> Thanks.
>
> WBR,
> Alexey.
>
>
> On Dec 18, 2009, at 17:22 , ddooley wrote:
>
>>
>> Hi Alexey,
>>
>> After treating HP-UX like Linux in the test for isLinux in
>> Controller.java
>> (in the 1.9.19 snapshot code base), our QA engineer ran a test that
>> made a
>> total of 85 requests from 10 different threads (monitored by the
>> profiler).
>>
>> This resulted in 12146 calls to getSpinRate(), 69 calls to
>> resetSpinCounter(), and 20 calls to workaroundSelectorSpin() over
>> the course
>> of 55 seconds.
>>
>> Based on the result, in your opinion, would this classify as the Linux
>> selector spin issue? Or would this be considered normal?
>> Unfortunately for
>> us, we don't have another system to run the test on. Our
>> application only
>> runs on the HP-UX system.
>>
>> If there is any more detail you would like, please let me know.
>> This test
>> was run on a system that does not exhibit the CPU spikes that
>> prompted the
>> original posting. We are setting up another test on a system that
>> is more
>> like the client experiencing the CPU spikes to see if the results
>> are any
>> different.
>>
>> Thanks for your help. I really appreciate it.
>>
>> --Dennis
>>
>>
>> Oleksiy Stashok wrote:
>>>
>>> Hi Dennis,
>>>
>>>> the System.getProperty("os.name") is "HP-UX". Our contact at HP
>>>> states that
>>>> the selector spin test that fails (see link below) is a sun bug, and
>>>> will be
>>>> fixed in 1.7 which makes me believe all the more the issues are
>>>> related. I
>>>> wonder if HP-UX and Linux share the same sun code base?
>>>>
>>>> Just for fun, I have modified the grizzly Controller class to treat
>>>> "HP-UX"
>>>> the same as "linux" and we will be running some tests. I'll let you
>>>> know
>>>> the results.
>>> Great, it should be my next suggestion :)))
>>> Please let me know the result.
>>>
>>> WBR,
>>> Alexey.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
>>> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Grizzly-Web-Server-on-HP-UX-PA-RISC-Performance-Issue-tp26733514p26845878.html
>> Sent from the Grizzly - Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
>> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>
>
>
--
View this message in context: http://old.nabble.com/Grizzly-Web-Server-on-HP-UX-PA-RISC-Performance-Issue-tp26733514p26904598.html
Sent from the Grizzly - Users mailing list archive at Nabble.com.