It looks like reply is stored in async write queue.
This is what I can see from logging of my requets:
[root_at_dcache-lab002 ~]# grep 0xda61ff90 /var/log/dcache/dcache-lab002Domain.log
11 Mar 2014 15:25:16 (dcache-lab002-A) processing RPC message: xid=0xda61ff90 remote=/131.169.161.125:847
11 Mar 2014 15:25:16 (dcache-lab002-A) dispatching message 0xda61ff90
11 Mar 2014 15:25:16 (dcache-lab002-A) ready to reply Done message 0xda61ff90
11 Mar 2014 15:25:16 (dcache-lab002-A) dispatching Done message 0xda61ff90
this is the point after FilterChainContext#write
11 Mar 2014 15:25:23 (dcache-lab002-A) Failed to send RPC message: xid=0xda61ff90 remote=/131.169.161.125:847 : Connection closed
and this is the point where CompletionHandler#failed is called
The request was 7 sec in the queue.
For now I have changed the write call to be blocking:
_context.write(_context.getAddress(), buffer, new EmptyCompletionHandler<WriteResult>() {
@Override
public void failed(Throwable throwable) {
_log.error("Failed to send RPC message: xid=0x{} remote={} : {}",
Integer.toHexString(buffer.getInt(0)), _connection.getPeerAddress(), throwable.getMessage());
}
}, true);
The error is gore. But I want to understand why reply was queued so long.
Tigran.
----- Original Message -----
> From: "Tigran Mkrtchyan" <tigran.mkrtchyan_at_desy.de>
> To: dev_at_grizzly.java.net
> Sent: Tuesday, March 11, 2014 8:57:32 AM
> Subject: Re: closed connection under high load
>
>
>
> Unfortunately, this is not fix delayed reply issue.
> So I am keep digging....
>
> Tigran.
>
> ----- Original Message -----
> > From: "Tigran Mkrtchyan" <tigran.mkrtchyan_at_desy.de>
> > To: dev_at_grizzly.java.net
> > Sent: Monday, March 10, 2014 4:49:38 PM
> > Subject: Re: closed connection under high load
> >
> >
> > Ok, it looks like I found it.
> > In NIOTransport#startSelectorRunners() starts as many
> > SelectorRunner as many cores node has. Nevertheless
> > the thread pool size to run them is limited to
> > number of threads configured with
> > NIOTransportBuilder#setSelectorThreadPoolConfig.
> > Is the numbers do not match or setSelectorThreadPoolConfig
> > configures less threads as SelectorRunner was already created,
> > some of them will never be executed.
> >
> >
> > Two small fixes attached.
> >
> > Tigran.
> >
> > ----- Original Message -----
> > > From: "Tigran Mkrtchyan" <tigran.mkrtchyan_at_desy.de>
> > > To: dev_at_grizzly.java.net
> > > Sent: Monday, March 10, 2014 2:51:57 PM
> > > Subject: Re: closed connection under high load
> > >
> > >
> > >
> > > Hi Alexey,
> > >
> > > probably I do not really understand how grizzly is working,
> > > but this is what I observe: a selector thread has only
> > > one connection in the selector. As a result, only events from a
> > > single connection get processed.
> > >
> > > Is it expected behavior?
> > >
> > > Tigran.
> > >
> > > ----- Original Message -----
> > > > From: "Tigran Mkrtchyan" <tigran.mkrtchyan_at_desy.de>
> > > > To: dev_at_grizzly.java.net
> > > > Sent: Monday, March 10, 2014 9:10:07 AM
> > > > Subject: Re: closed connection under high load
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > From: "Oleksiy Stashok" <oleksiy.stashok_at_oracle.com>
> > > > > To: dev_at_grizzly.java.net
> > > > > Sent: Friday, March 7, 2014 10:50:15 PM
> > > > > Subject: Re: closed connection under high load
> > > > >
> > > > > Hi Tigran,
> > > > >
> > > > >
> > > > > On 07.03.14 07:36, Mkrtchyan, Tigran wrote:
> > > > > >
> > > > > > It looks like that IO dominated by a single connection.
> > > > > > I monitor three probes: onWriteEvent, onReadEvent and
> > > > > > onIOEventReadyEvent. While I see ACCEPT from all clients,
> > > > > > READ and WRITE comes only from a single client.
> > > > > Looks like your app is made in a way, that it can't use
> > > > > SameThreadStrategy. If you can share your code and prepare the test -
> > > > > I
> > > > > can take a look and try to find out why it doesn't work with the
> > > > > SameThreadStrategy.
> > > >
> > > > Hi Alexey,
> > > >
> > > > the code is available on the github:
> > > >
> > > > https://github.com/kofemann/oncrpc4j
> > > >
> > > > You probably need to have a look at
> > > >
> > > > https://github.com/kofemann/oncrpc4j/blob/master/oncrpc4j-core/src/main/java/org/dcache/xdr/OncRpcSvc.java
> > > >
> > > > and
> > > >
> > > > https://github.com/kofemann/oncrpc4j/blob/master/oncrpc4j-core/src/main/java/org/dcache/xdr/GrizzlyXdrTransport.java
> > > >
> > > > possibly
> > > >
> > > > https://github.com/kofemann/oncrpc4j/blob/master/oncrpc4j-core/src/main/java/org/dcache/xdr/RpcDispatcher.java
> > > >
> > > > I will try to prepare a simple client to produce a load without running
> > > > full
> > > > NFS server.
> > > >
> > > > Thanks,
> > > > Tigran.
> > > >
> > > > >
> > > > > WBR,
> > > > > Alexey.
> > > > >
> > > > > >
> > > > > > Tigran.
> > > > > >
> > > > > > ----- Original Message -----
> > > > > >> From: "Oleksiy Stashok" <oleksiy.stashok_at_oracle.com>
> > > > > >> To: dev_at_grizzly.java.net
> > > > > >> Sent: Friday, March 7, 2014 2:39:22 AM
> > > > > >> Subject: Re: closed connection under high load
> > > > > >>
> > > > > >>
> > > > > >> On 06.03.14 01:29, Mkrtchyan, Tigran wrote:
> > > > > >>> ----- Original Message -----
> > > > > >>>> From: "Oleksiy Stashok" <oleksiy.stashok_at_oracle.com>
> > > > > >>>> To: dev_at_grizzly.java.net
> > > > > >>>> Sent: Thursday, March 6, 2014 2:35:00 AM
> > > > > >>>> Subject: Re: closed connection under high load
> > > > > >>>>
> > > > > >>>> Hi Tigran,
> > > > > >>>>
> > > > > >>>>>> Hi Tigran,
> > > > > >>>>>>
> > > > > >>>>>> looks like the connection is getting closed because of
> > > > > >>>>>> IOException
> > > > > >>>>>> or
> > > > > >>>>>> -1
> > > > > >>>>>> returned during channel.read(), which is expected.
> > > > > >>>>>> So IMO it's nothing decided by Grizzly, it's either other side
> > > > > >>>>>> or
> > > > > >>>>>> local
> > > > > >>>>>> OS closed the connection.
> > > > > >>>>> Yes, in did, the client closes the connection. But it looks
> > > > > >>>>> like
> > > > > >>>>> the
> > > > > >>>>> reason
> > > > > >>>>> of it
> > > > > >>>>> that reply from server sent 15sec after request is arrived. The
> > > > > >>>>> filter
> > > > > >>>>> itself took
> > > > > >>>>> max 800ms. Any ideas how to trace where rest of the time was
> > > > > >>>>> spent?
> > > > > >>>> You can register ConnectionProbe for each Connection:
> > > > > >>>> / connection.getMonitoringConfig().addProbes(new
> > > > > >>>> ConnectionProbe() {
> > > > > >>>> .....
> > > > > >>>> });/
> > > > > >>>>
> > > > > >>>> and monitor different Connection events and time when it
> > > > > >>>> happened.
> > > > > >>>> It might be interesting to monitor onWriteEvent() and see if
> > > > > >>>> write
> > > > > >>>> delay
> > > > > >>>> occurs on Grizzly/Java or OS level. For example Grizzly reports
> > > > > >>>> the
> > > > > >>>> socketChannel.write(ByteBuffer) had been accepted the
> > > > > >>>> ByteBuffer,
> > > > > >>>> but
> > > > > >>>> the peer didn't get this data, which means the delay happens in
> > > > > >>>> OS
> > > > > >>>> layer.
> > > > > >>> I was logging in my code where I start to process the request and
> > > > > >>> this
> > > > > >>> was with quite a delay. Unfortunately, my work load nodes are
> > > > > >>> turned
> > > > > >>> off till the end of the week as we update cooling infrastructure
> > > > > >>> in
> > > > > >>> our
> > > > > >>> data center. But I will continue investigations next week. Is
> > > > > >>> there
> > > > > >>> any
> > > > > >>> best practice in which relations should be the number of selector
> > > > > >>> and
> > > > > >>> worker threads?
> > > > > >> By default we'd suggest selector-threads-count=CPU-count and
> > > > > >> worker-threads-count=CPU-count*2
> > > > > >> I'd also recommend to use SameThreadStrategy (no worker threads),
> > > > > >> where
> > > > > >> it's possible. For example if the app logic doesn't have any
> > > > > >> blocking,
> > > > > >> waiting kind of code - there is no reason to pay for a thread
> > > > > >> context
> > > > > >> switch.
> > > > > >>
> > > > > >>> BTW, what do you think about proposed change:
> > > > > >>> https://github.com/GrizzlyNIO/grizzly-mirror/pull/1
> > > > > >> Looks good.
> > > > > >> Will apply it.
> > > > > >>
> > > > > >> Thank you!
> > > > > >>
> > > > > >> WBR,
> > > > > >> Alexey.
> > > > > >>
> > > > > >>
> > > > > >>> Tigran.
> > > > > >>>
> > > > > >>>> Thanks.
> > > > > >>>>
> > > > > >>>> WBR,
> > > > > >>>> Alexey.
> > > > > >>>>
> > > > > >>>>> Looking at the requests log I can see that request processing
> > > > > >>>>> is
> > > > > >>>>> not
> > > > > >>>>> equally distributed.
> > > > > >>>>> Some connections processed quite active, while others are not.
> > > > > >>>>> Is
> > > > > >>>>> there
> > > > > >>>>> a
> > > > > >>>>> way how I can see
> > > > > >>>>> selector threads activity?
> > > > > >>>>>
> > > > > >>>>> Tigran.
> > > > > >>>>>
> > > > > >>>>>> Thanks.
> > > > > >>>>>>
> > > > > >>>>>> WBR,
> > > > > >>>>>> Alexey.
> > > > > >>>>>>
> > > > > >>>>>> On 26.02.14 14:06, Mkrtchyan, Tigran wrote:
> > > > > >>>>>>> Hi,
> > > > > >>>>>>>
> > > > > >>>>>>> we observe that grizzly decides to close connection under
> > > > > >>>>>>> high
> > > > > >>>>>>> load.
> > > > > >>>>>>> Any idea how to find the reason?
> > > > > >>>>>>>
> > > > > >>>>>>> Thanks,
> > > > > >>>>>>> Tigran.
> > > > > >>>>>>>
> > > > > >>>>>>> java.io.IOException: Connection closed
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.asyncqueue.TaskQueue.onClose(TaskQueue.java:302)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.onClose(AbstractNIOAsyncQueueWriter.java:591)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransport.closeConnection(TCPNIOTransport.java:389)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.NIOConnection$2.run(NIOConnection.java:439)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.DefaultSelectorHandler$RunnableTask.run(DefaultSelectorHandler.java:489)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.DefaultSelectorHandler.processPendingTaskQueue(DefaultSelectorHandler.java:295)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.DefaultSelectorHandler.processPendingTasks(DefaultSelectorHandler.java:286)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.DefaultSelectorHandler.preSelect(DefaultSelectorHandler.java:100)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.SelectorRunner.doSelect(SelectorRunner.java:326)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.SelectorRunner.run(SelectorRunner.java:271)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:565)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:545)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> > > > > >>>>>>> Caused by: java.lang.Exception: Close stack trace
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.NIOConnection.close0(NIOConnection.java:426)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOConnection.close0(TCPNIOConnection.java:292)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransport.read(TCPNIOTransport.java:648)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransportFilter.handleRead(TCPNIOTransportFilter.java:75)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.filterchain.TransportFilter.handleRead(TransportFilter.java:173)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:546)
> > > > > >>>>>>> [grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:113)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:115)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:55)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>>>>>> at
> > > > > >>>>>>> org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:135)
> > > > > >>>>>>> ~[grizzly-framework-2.3.5.jar:2.3.5]
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>