Re: grizzly 2.2 performance issue

From: Oleksiy Stashok <oleksiy.stashok_at_oracle.com>
Date: Tue, 11 Sep 2012 14:59:30 +0200

Hi Tigran,

we've just released Grizzly 2.3-beta3 w/ some perf. fixes related to
WorkerThread strategy.
We actually changes the thread-pool queue implementation
LinkedTransferQueue -> LinkedBlockingQueue (looks like
LinkedTransferQueue is spinning too much).

I will send you a patch for 2.3-beta3 separately, which has a bit
different impl. of socket.read(...), would be interesting to see if it
makes any difference for you.

One more thing you can try (once you use WorkerThread) - use
TCPNIOTransport's optimizeForMulteplexing mode, which you can set via
tcpNioTransport instance like:
tcpNioTransport.setOptimizeForMultiplexing(true);

Pls. let me know if this will change the bm results.

Thanks.

WBR,
Alexey.

On 09/10/2012 03:24 PM, Tigran Mkrtchyan wrote:
> Hi Oleksiy,
>
> finally I gor some time to test 2.2.19.
>
> Here what I get:
>
> org.glassfish.grizzly.PendingWriteQueueLimitExceededException: Max
> queued data limit exceeded: 2999452>47440
> at org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.checkQueueSize(AbstractNIOAsyncQueueWriter.java:619)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.writeQueueRecord(AbstractNIOAsyncQueueWriter.java:279)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.write(AbstractNIOAsyncQueueWriter.java:219)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.glassfish.grizzly.nio.transport.TCPNIOTransportFilter.handleWrite(TCPNIOTransportFilter.java:127)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.glassfish.grizzly.filterchain.TransportFilter.handleWrite(TransportFilter.java:191)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:265)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:200)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:134)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:78)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:652)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:568)
> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
> at org.dcache.xdr.GrizzlyXdrTransport.send(GrizzlyXdrTransport.java:47)
> [xdr-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT]
>
>
> Shall I add a push back handler to retry write operation?
>
> Tigran
>
> On Thu, Aug 23, 2012 at 3:37 PM, Oleksiy Stashok
> <oleksiy.stashok_at_oracle.com> wrote:
>> Hi Tigran,
>>
>>> There is the fragment of the handler code ( to complete code can be
>>> found at
>>> http://code.google.com/p/nio-jrpc/source/browse/oncrpc4j-core/src/main/java/org/dcache/xdr/RpcMessageParserTCP.java):
>>>
>>> @Override
>>> public NextAction handleRead(FilterChainContext ctx) throws
>>> IOException {
>>>
>>> Buffer messageBuffer = ctx.getMessage();
>>> if (messageBuffer == null) {
>>> return ctx.getStopAction();
>>> }
>>>
>>> if (!isAllFragmentsArrived(messageBuffer)) {
>>> return ctx.getStopAction(messageBuffer);
>>> }
>>>
>>> ctx.setMessage(assembleXdr(messageBuffer));
>>>
>>> final Buffer reminder = messageBuffer.hasRemaining()
>>> ? messageBuffer.split(messageBuffer.position()) : null;
>>>
>>> return ctx.getInvokeAction(reminder);
>>> }
>>>
>>>
>>> Up to now I was sure that is there are more data to process (reminder
>>> != null) or more incoming data available grizzly will process it.
>> Right, it will, but only after current filterchain processing is finished.
>>
>>
>>
>>> The difference between SamaThread and WorkerThread strategies is only in
>>> the way of processing.
>> The diff. is that SameThreadStrategy will do FilterChain processing in the
>> Grizzly core thread (selector thread for NIO), and WorkerThreadStrategy will
>> run FilterChain processing in Transport's worker thread.
>>
>>
>>> For example it for some reason processing takes
>>> too long, the I can read the message and drop in without processing as
>>> client will retry any will ignore late reply.
>> If that's the case - then pls. try the approach i suggested in prev. email.
>>
>> WBR,
>> Alexey.
>>
>>
>>> Anyway, now I now that this was a wrong assumption.
>>>
>>> Tigran.
>>>
>>>
>>>> I'm not saying this will work faster, but it will really parallelize
>>>> request
>>>> processing.
>>>>
>>>> Thanks.
>>>>
>>>> WBR,
>>>> Alexey.
>>>>
>>>>> Regads,
>>>>> Tigran.
>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> WBR,
>>>>>> Alexey.
>>>>>>
>>>>>>
>>>>>> On 08/22/2012 02:16 PM, Tigran Mkrtchyan wrote:
>>>>>>> Hi Alexey,
>>>>>>>
>>>>>>> On Wed, Aug 22, 2012 at 11:37 AM, Oleksiy Stashok
>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>> Hi Tigran,
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/22/2012 08:03 AM, Tigran Mkrtchyan wrote:
>>>>>>>>> The result is performs 2.1 ~15% better than 2.2 and 2.3:
>>>>>>>>>
>>>>>>>>> grizzly 2.2 http://hammercloud.cern.ch/hc/app/atlas/test/20009667/
>>>>>>>>> girzzly 2.3 http://hammercloud.cern.ch/hc/app/atlas/test/20009668/
>>>>>>>>> grizzly 2.1 http://hammercloud.cern.ch/hc/app/atlas/test/20009669/
>>>>>>>>>
>>>>>>>>> (look at the mean of most right graph)
>>>>>>>>>
>>>>>>>>> The only difference in the code is attached. May be problem is
>>>>>>>>> there.
>>>>>>>> Thank you for the info. I don't see any problem in your code. Just
>>>>>>>> FYI
>>>>>>>> we're
>>>>>>>> deprecating PushBack mechanism in Grizzly 2.3 (will remove in Grizzly
>>>>>>>> 3).
>>>>>>>> It
>>>>>>>> would be still possible to check async write queue status, if it's
>>>>>>>> overloaded or not, register a listener, which will be notified once
>>>>>>>> you
>>>>>>>> can
>>>>>>>> write.... But the important difference is that async write queue will
>>>>>>>> keep
>>>>>>>> accepting data even if it's overloaded (no exception thrown). We
>>>>>>>> think
>>>>>>>> this
>>>>>>>> behavior is easier to implement on Grizzly side (it will perform
>>>>>>>> better)
>>>>>>>> and
>>>>>>>> it actually offers same/similar functionality as push back mechanism.
>>>>>>> I have added push-back handler as we start to see rejections. It's
>>>>>>> good to hear the you will push a default implementation into async
>>>>>>> write queue.
>>>>>>>
>>>>>>>> As for the results you ran, wanted to check
>>>>>>>>
>>>>>>>> 1) All three runs use the same I/O strategy (WorkerThreadStrategy) ?
>>>>>>> Yes, the only difference is the patch which was attached.
>>>>>>>
>>>>>>>> 2) Are you configuring Grizzly worker thread pool in your code?
>>>>>>> No we use what ever grizzly has by default.
>>>>>>>
>>>>>>>> I'll run simple echo test and will try to reproduce the problem, will
>>>>>>>> let
>>>>>>>> you know.
>>>>>>> Just to remind you. The client are 16 physical hosts doing NFS IO to a
>>>>>>> single server.
>>>>>>> Each client may ( and does ) send 16 (just a coincidence with number
>>>>>>> of hosts ) request. In total server have to process 256 requests at
>>>>>>> any point and reply with 16x1MB messages.
>>>>>>> The server host has 24 cores and 32 GB or RAM.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Tigran.
>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> WBR,
>>>>>>>> Alexey.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Tigran.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Aug 17, 2012 at 4:19 PM, Tigran Mkrtchyan
>>>>>>>>> <tigran.mkrtchyan_at_desy.de> wrote:
>>>>>>>>>> Hi Alexey,
>>>>>>>>>>
>>>>>>>>>> We had SameThreadStrategy. Now I switched to WorkerThreadStratedy.
>>>>>>>>>>
>>>>>>>>>> Tigran.
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 17, 2012 at 4:12 PM, Oleksiy Stashok
>>>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>>> Hi Tigran,
>>>>>>>>>>>
>>>>>>>>>>> thanks a lot for the info! Would be great if you can confirm these
>>>>>>>>>>> results
>>>>>>>>>>> next week.
>>>>>>>>>>> Just interesting, are you using SameThreadStrategy in your tests?
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> WBR,
>>>>>>>>>>> Alexey.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 08/17/2012 03:31 PM, Tigran Mkrtchyan wrote:
>>>>>>>>>>>> Hi Alexey,
>>>>>>>>>>>>
>>>>>>>>>>>> the 2.3-SNAPSHOT is comparable with 2.1.11.
>>>>>>>>>>>> The 2.2.9 is ~5% slower in my simple test. We can run
>>>>>>>>>>>> more production level tests next week as they take ~5-6 hours per
>>>>>>>>>>>> run
>>>>>>>>>>>> and require special setup.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Aug 17, 2012 at 2:10 PM, Tigran Mkrtchyan
>>>>>>>>>>>> <tigran.mkrtchyan_at_desy.de> wrote:
>>>>>>>>>>>>> NP. give me an hour.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Aug 17, 2012 at 12:46 PM, Oleksiy Stashok
>>>>>>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>>>>>> Tigran, before you test (if you planned) releases (2.1.7;
>>>>>>>>>>>>>> 2.2.9),
>>>>>>>>>>>>>> wanted
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> ask if you can try Grizzly 2.3-SNAPSHOT first?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> WBR,
>>>>>>>>>>>>>> Alexey.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 08/17/2012 11:51 AM, Oleksiy Stashok wrote:
>>>>>>>>>>>>>>> Hi Tigran,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> thank you for the info.
>>>>>>>>>>>>>>> We'll investigate that!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We'll appreciate any help, which will let us narrow down the
>>>>>>>>>>>>>>> problem,
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> example if you have some time to try another releases
>>>>>>>>>>>>>>> 2.1.7<release<2.2.9 it
>>>>>>>>>>>>>>> would help a lot.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> WBR,
>>>>>>>>>>>>>>> Alexey.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 08/17/2012 11:36 AM, Tigran Mkrtchyan wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> After lot of time spent on debugging we found that the
>>>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>> grizzly-2.1.7 to grizzly-2.2.9
>>>>>>>>>>>>>>>> drop performance of out server by 10%.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> the profiling results can be found at:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://www.dcache.org/grizzly-2-1.xml
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> http://www.dcache.org/grizzly-2-2.xml
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> we run the same application against the server (just to
>>>>>>>>>>>>>>>> remind,
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> is a NFSv4.1 server written in java).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Let me know if you need more info.
>>>>>>>>>>>>>>>> For now we will rollback to 2.1.7 version.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>>>>