dev@grizzly.java.net

Re: closed connection under high load

From: Mkrtchyan, Tigran <tigran.mkrtchyan_at_desy.de>
Date: Mon, 10 Mar 2014 16:49:38 +0100 (CET)

Ok, it looks like I found it.
In NIOTransport#startSelectorRunners() starts as many
SelectorRunner as many cores node has. Nevertheless
the thread pool size to run them is limited to
number of threads configured with
NIOTransportBuilder#setSelectorThreadPoolConfig.
Is the numbers do not match or setSelectorThreadPoolConfig
configures less threads as SelectorRunner was already created,
some of them will never be executed.


Two small fixes attached.

Tigran.

----- Original Message -----
> From: "Tigran Mkrtchyan" <tigran.mkrtchyan_at_desy.de>
> To: dev_at_grizzly.java.net
> Sent: Monday, March 10, 2014 2:51:57 PM
> Subject: Re: closed connection under high load
>
>
>
> Hi Alexey,
>
> probably I do not really understand how grizzly is working,
> but this is what I observe: a selector thread has only
> one connection in the selector. As a result, only events from a
> single connection get processed.
>
> Is it expected behavior?
>
> Tigran.
>
> ----- Original Message -----
> > From: "Tigran Mkrtchyan" <tigran.mkrtchyan_at_desy.de>
> > To: dev_at_grizzly.java.net
> > Sent: Monday, March 10, 2014 9:10:07 AM
> > Subject: Re: closed connection under high load
> >
> >
> >
> > ----- Original Message -----
> > > From: "Oleksiy Stashok" <oleksiy.stashok_at_oracle.com>
> > > To: dev_at_grizzly.java.net
> > > Sent: Friday, March 7, 2014 10:50:15 PM
> > > Subject: Re: closed connection under high load
> > >
> > > Hi Tigran,
> > >
> > >
> > > On 07.03.14 07:36, Mkrtchyan, Tigran wrote:
> > > >
> > > > It looks like that IO dominated by a single connection.
> > > > I monitor three probes: onWriteEvent, onReadEvent and
> > > > onIOEventReadyEvent. While I see ACCEPT from all clients,
> > > > READ and WRITE comes only from a single client.
> > > Looks like your app is made in a way, that it can't use
> > > SameThreadStrategy. If you can share your code and prepare the test - I
> > > can take a look and try to find out why it doesn't work with the
> > > SameThreadStrategy.
> >
> > Hi Alexey,
> >
> > the code is available on the github:
> >
> > https://github.com/kofemann/oncrpc4j
> >
> > You probably need to have a look at
> >
> > https://github.com/kofemann/oncrpc4j/blob/master/oncrpc4j-core/src/main/java/org/dcache/xdr/OncRpcSvc.java
> >
> > and
> >
> > https://github.com/kofemann/oncrpc4j/blob/master/oncrpc4j-core/src/main/java/org/dcache/xdr/GrizzlyXdrTransport.java
> >
> > possibly
> >
> > https://github.com/kofemann/oncrpc4j/blob/master/oncrpc4j-core/src/main/java/org/dcache/xdr/RpcDispatcher.java
> >
> > I will try to prepare a simple client to produce a load without running
> > full
> > NFS server.
> >
> > Thanks,
> > Tigran.
> >
> > >
> > > WBR,
> > > Alexey.
> > >
> > > >
> > > > Tigran.
> > > >
> > > > ----- Original Message -----
> > > >> From: "Oleksiy Stashok" <oleksiy.stashok_at_oracle.com>
> > > >> To: dev_at_grizzly.java.net
> > > >> Sent: Friday, March 7, 2014 2:39:22 AM
> > > >> Subject: Re: closed connection under high load
> > > >>
> > > >>
> > > >> On 06.03.14 01:29, Mkrtchyan, Tigran wrote:
> > > >>> ----- Original Message -----
> > > >>>> From: "Oleksiy Stashok" <oleksiy.stashok_at_oracle.com>
> > > >>>> To: dev_at_grizzly.java.net
> > > >>>> Sent: Thursday, March 6, 2014 2:35:00 AM
> > > >>>> Subject: Re: closed connection under high load
> > > >>>>
> > > >>>> Hi Tigran,
> > > >>>>
> > > >>>>>> Hi Tigran,
> > > >>>>>>
> > > >>>>>> looks like the connection is getting closed because of IOException
> > > >>>>>> or
> > > >>>>>> -1
> > > >>>>>> returned during channel.read(), which is expected.
> > > >>>>>> So IMO it's nothing decided by Grizzly, it's either other side or
> > > >>>>>> local
> > > >>>>>> OS closed the connection.
> > > >>>>> Yes, in did, the client closes the connection. But it looks like
> > > >>>>> the
> > > >>>>> reason
> > > >>>>> of it
> > > >>>>> that reply from server sent 15sec after request is arrived. The
> > > >>>>> filter
> > > >>>>> itself took
> > > >>>>> max 800ms. Any ideas how to trace where rest of the time was spent?
> > > >>>> You can register ConnectionProbe for each Connection:
> > > >>>> / connection.getMonitoringConfig().addProbes(new
> > > >>>> ConnectionProbe() {
> > > >>>> .....
> > > >>>> });/
> > > >>>>
> > > >>>> and monitor different Connection events and time when it happened.
> > > >>>> It might be interesting to monitor onWriteEvent() and see if write
> > > >>>> delay
> > > >>>> occurs on Grizzly/Java or OS level. For example Grizzly reports the
> > > >>>> socketChannel.write(ByteBuffer) had been accepted the ByteBuffer,
> > > >>>> but
> > > >>>> the peer didn't get this data, which means the delay happens in OS
> > > >>>> layer.
> > > >>> I was logging in my code where I start to process the request and
> > > >>> this
> > > >>> was with quite a delay. Unfortunately, my work load nodes are turned
> > > >>> off till the end of the week as we update cooling infrastructure in
> > > >>> our
> > > >>> data center. But I will continue investigations next week. Is there
> > > >>> any
> > > >>> best practice in which relations should be the number of selector and
> > > >>> worker threads?
> > > >> By default we'd suggest selector-threads-count=CPU-count and
> > > >> worker-threads-count=CPU-count*2
> > > >> I'd also recommend to use SameThreadStrategy (no worker threads),
> > > >> where
> > > >> it's possible. For example if the app logic doesn't have any blocking,
> > > >> waiting kind of code - there is no reason to pay for a thread context
> > > >> switch.
> > > >>
> > > >>> BTW, what do you think about proposed change:
> > > >>> https://github.com/GrizzlyNIO/grizzly-mirror/pull/1
> > > >> Looks good.
> > > >> Will apply it.
> > > >>
> > > >> Thank you!
> > > >>
> > > >> WBR,
> > > >> Alexey.
> > > >>
> > > >>
> > > >>> Tigran.
> > > >>>
> > > >>>> Thanks.
> > > >>>>
> > > >>>> WBR,
> > > >>>> Alexey.
> > > >>>>
> > > >>>>> Looking at the requests log I can see that request processing is
> > > >>>>> not
> > > >>>>> equally distributed.
> > > >>>>> Some connections processed quite active, while others are not. Is
> > > >>>>> there
> > > >>>>> a
> > > >>>>> way how I can see
> > > >>>>> selector threads activity?
> > > >>>>>
> > > >>>>> Tigran.
> > > >>>>>
> > > >>>>>> Thanks.
> > > >>>>>>
> > > >>>>>> WBR,
> > > >>>>>> Alexey.
> > > >>>>>>
> > > >>>>>> On 26.02.14 14:06, Mkrtchyan, Tigran wrote:
> > > >>>>>>> Hi,
> > > >>>>>>>
> > > >>>>>>> we observe that grizzly decides to close connection under high
> > > >>>>>>> load.
> > > >>>>>>> Any idea how to find the reason?
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Tigran.
> > > >>>>>>>
> > > >>>>>>> java.io.IOException: Connection closed
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.asyncqueue.TaskQueue.onClose(TaskQueue.java:302)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.onClose(AbstractNIOAsyncQueueWriter.java:591)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransport.closeConnection(TCPNIOTransport.java:389)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.NIOConnection$2.run(NIOConnection.java:439)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.DefaultSelectorHandler$RunnableTask.run(DefaultSelectorHandler.java:489)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.DefaultSelectorHandler.processPendingTaskQueue(DefaultSelectorHandler.java:295)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.DefaultSelectorHandler.processPendingTasks(DefaultSelectorHandler.java:286)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.DefaultSelectorHandler.preSelect(DefaultSelectorHandler.java:100)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.SelectorRunner.doSelect(SelectorRunner.java:326)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.SelectorRunner.run(SelectorRunner.java:271)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:565)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:545)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> > > >>>>>>> Caused by: java.lang.Exception: Close stack trace
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.NIOConnection.close0(NIOConnection.java:426)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOConnection.close0(TCPNIOConnection.java:292)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransport.read(TCPNIOTransport.java:648)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransportFilter.handleRead(TCPNIOTransportFilter.java:75)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.filterchain.TransportFilter.handleRead(TransportFilter.java:173)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:546)
> > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:113)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:115)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:55)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>>>>>> at
> > > >>>>>>> org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:135)
> > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > >>
> > >
> > >
> >
>