dev@grizzly.java.net

Re: closed connection under high load

From: Mkrtchyan, Tigran <tigran.mkrtchyan_at_desy.de>
Date: Mon, 10 Mar 2014 09:10:07 +0100 (CET)

----- Original Message -----
> From: "Oleksiy Stashok" <oleksiy.stashok_at_oracle.com>
> To: dev_at_grizzly.java.net
> Sent: Friday, March 7, 2014 10:50:15 PM
> Subject: Re: closed connection under high load
>
> Hi Tigran,
>
>
> On 07.03.14 07:36, Mkrtchyan, Tigran wrote:
> >
> > It looks like that IO dominated by a single connection.
> > I monitor three probes: onWriteEvent, onReadEvent and
> > onIOEventReadyEvent. While I see ACCEPT from all clients,
> > READ and WRITE comes only from a single client.
> Looks like your app is made in a way, that it can't use
> SameThreadStrategy. If you can share your code and prepare the test - I
> can take a look and try to find out why it doesn't work with the
> SameThreadStrategy.

Hi Alexey,

the code is available on the github:

https://github.com/kofemann/oncrpc4j

You probably need to have a look at

https://github.com/kofemann/oncrpc4j/blob/master/oncrpc4j-core/src/main/java/org/dcache/xdr/OncRpcSvc.java

and

https://github.com/kofemann/oncrpc4j/blob/master/oncrpc4j-core/src/main/java/org/dcache/xdr/GrizzlyXdrTransport.java

possibly

https://github.com/kofemann/oncrpc4j/blob/master/oncrpc4j-core/src/main/java/org/dcache/xdr/RpcDispatcher.java

I will try to prepare a simple client to produce a load without running full NFS server.

Thanks,
   Tigran.

>
> WBR,
> Alexey.
>
> >
> > Tigran.
> >
> > ----- Original Message -----
> >> From: "Oleksiy Stashok" <oleksiy.stashok_at_oracle.com>
> >> To: dev_at_grizzly.java.net
> >> Sent: Friday, March 7, 2014 2:39:22 AM
> >> Subject: Re: closed connection under high load
> >>
> >>
> >> On 06.03.14 01:29, Mkrtchyan, Tigran wrote:
> >>> ----- Original Message -----
> >>>> From: "Oleksiy Stashok" <oleksiy.stashok_at_oracle.com>
> >>>> To: dev_at_grizzly.java.net
> >>>> Sent: Thursday, March 6, 2014 2:35:00 AM
> >>>> Subject: Re: closed connection under high load
> >>>>
> >>>> Hi Tigran,
> >>>>
> >>>>>> Hi Tigran,
> >>>>>>
> >>>>>> looks like the connection is getting closed because of IOException or
> >>>>>> -1
> >>>>>> returned during channel.read(), which is expected.
> >>>>>> So IMO it's nothing decided by Grizzly, it's either other side or
> >>>>>> local
> >>>>>> OS closed the connection.
> >>>>> Yes, in did, the client closes the connection. But it looks like the
> >>>>> reason
> >>>>> of it
> >>>>> that reply from server sent 15sec after request is arrived. The filter
> >>>>> itself took
> >>>>> max 800ms. Any ideas how to trace where rest of the time was spent?
> >>>> You can register ConnectionProbe for each Connection:
> >>>> / connection.getMonitoringConfig().addProbes(new
> >>>> ConnectionProbe() {
> >>>> .....
> >>>> });/
> >>>>
> >>>> and monitor different Connection events and time when it happened.
> >>>> It might be interesting to monitor onWriteEvent() and see if write delay
> >>>> occurs on Grizzly/Java or OS level. For example Grizzly reports the
> >>>> socketChannel.write(ByteBuffer) had been accepted the ByteBuffer, but
> >>>> the peer didn't get this data, which means the delay happens in OS
> >>>> layer.
> >>> I was logging in my code where I start to process the request and this
> >>> was with quite a delay. Unfortunately, my work load nodes are turned
> >>> off till the end of the week as we update cooling infrastructure in our
> >>> data center. But I will continue investigations next week. Is there any
> >>> best practice in which relations should be the number of selector and
> >>> worker threads?
> >> By default we'd suggest selector-threads-count=CPU-count and
> >> worker-threads-count=CPU-count*2
> >> I'd also recommend to use SameThreadStrategy (no worker threads), where
> >> it's possible. For example if the app logic doesn't have any blocking,
> >> waiting kind of code - there is no reason to pay for a thread context
> >> switch.
> >>
> >>> BTW, what do you think about proposed change:
> >>> https://github.com/GrizzlyNIO/grizzly-mirror/pull/1
> >> Looks good.
> >> Will apply it.
> >>
> >> Thank you!
> >>
> >> WBR,
> >> Alexey.
> >>
> >>
> >>> Tigran.
> >>>
> >>>> Thanks.
> >>>>
> >>>> WBR,
> >>>> Alexey.
> >>>>
> >>>>> Looking at the requests log I can see that request processing is not
> >>>>> equally distributed.
> >>>>> Some connections processed quite active, while others are not. Is there
> >>>>> a
> >>>>> way how I can see
> >>>>> selector threads activity?
> >>>>>
> >>>>> Tigran.
> >>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> WBR,
> >>>>>> Alexey.
> >>>>>>
> >>>>>> On 26.02.14 14:06, Mkrtchyan, Tigran wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> we observe that grizzly decides to close connection under high load.
> >>>>>>> Any idea how to find the reason?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Tigran.
> >>>>>>>
> >>>>>>> java.io.IOException: Connection closed
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.asyncqueue.TaskQueue.onClose(TaskQueue.java:302)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.onClose(AbstractNIOAsyncQueueWriter.java:591)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransport.closeConnection(TCPNIOTransport.java:389)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.NIOConnection$2.run(NIOConnection.java:439)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.DefaultSelectorHandler$RunnableTask.run(DefaultSelectorHandler.java:489)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.DefaultSelectorHandler.processPendingTaskQueue(DefaultSelectorHandler.java:295)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.DefaultSelectorHandler.processPendingTasks(DefaultSelectorHandler.java:286)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.DefaultSelectorHandler.preSelect(DefaultSelectorHandler.java:100)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.SelectorRunner.doSelect(SelectorRunner.java:326)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.SelectorRunner.run(SelectorRunner.java:271)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:565)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:545)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> >>>>>>> Caused by: java.lang.Exception: Close stack trace
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.NIOConnection.close0(NIOConnection.java:426)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOConnection.close0(TCPNIOConnection.java:292)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransport.read(TCPNIOTransport.java:648)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransportFilter.handleRead(TCPNIOTransportFilter.java:75)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.filterchain.TransportFilter.handleRead(TransportFilter.java:173)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:546)
> >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:113)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:115)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:55)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>>>>>> at
> >>>>>>> org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:135)
> >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> >>
>
>