Re: grizzly 2.2 performance issue

From: Oleksiy Stashok <oleksiy.stashok_at_oracle.com>
Date: Tue, 11 Sep 2012 15:24:19 +0200

Hi Tigran,

On 09/11/2012 03:18 PM, Tigran Mkrtchyan wrote:
> BTW, are you going to JavaOne? I will be in Sunnyvale the same week (
> October 1-5) and can show up.
Unfortunately not this year :(
But as I understand Prague is not far from you, so it might be
considered as an option :)

WBR,
Alexey.

>
> Tigran.
>
> On Tue, Sep 11, 2012 at 2:59 PM, Oleksiy Stashok
> <oleksiy.stashok_at_oracle.com> wrote:
>> Hi Tigran,
>>
>> we've just released Grizzly 2.3-beta3 w/ some perf. fixes related to
>> WorkerThread strategy.
>> We actually changes the thread-pool queue implementation LinkedTransferQueue
>> -> LinkedBlockingQueue (looks like LinkedTransferQueue is spinning too
>> much).
>>
>> I will send you a patch for 2.3-beta3 separately, which has a bit different
>> impl. of socket.read(...), would be interesting to see if it makes any
>> difference for you.
>>
>> One more thing you can try (once you use WorkerThread) - use
>> TCPNIOTransport's optimizeForMulteplexing mode, which you can set via
>> tcpNioTransport instance like:
>> tcpNioTransport.setOptimizeForMultiplexing(true);
>>
>> Pls. let me know if this will change the bm results.
>>
>> Thanks.
>>
>> WBR,
>> Alexey.
>>
>>
>> On 09/10/2012 03:24 PM, Tigran Mkrtchyan wrote:
>>> Hi Oleksiy,
>>>
>>> finally I gor some time to test 2.2.19.
>>>
>>> Here what I get:
>>>
>>> org.glassfish.grizzly.PendingWriteQueueLimitExceededException: Max
>>> queued data limit exceeded: 2999452>47440
>>> at
>>> org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.checkQueueSize(AbstractNIOAsyncQueueWriter.java:619)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.writeQueueRecord(AbstractNIOAsyncQueueWriter.java:279)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.write(AbstractNIOAsyncQueueWriter.java:219)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.glassfish.grizzly.nio.transport.TCPNIOTransportFilter.handleWrite(TCPNIOTransportFilter.java:127)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.glassfish.grizzly.filterchain.TransportFilter.handleWrite(TransportFilter.java:191)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:265)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:200)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:134)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:78)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:652)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:568)
>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>> at
>>> org.dcache.xdr.GrizzlyXdrTransport.send(GrizzlyXdrTransport.java:47)
>>> [xdr-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT]
>>>
>>>
>>> Shall I add a push back handler to retry write operation?
>>>
>>> Tigran
>>>
>>> On Thu, Aug 23, 2012 at 3:37 PM, Oleksiy Stashok
>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>> Hi Tigran,
>>>>
>>>>> There is the fragment of the handler code ( to complete code can be
>>>>> found at
>>>>>
>>>>> http://code.google.com/p/nio-jrpc/source/browse/oncrpc4j-core/src/main/java/org/dcache/xdr/RpcMessageParserTCP.java):
>>>>>
>>>>> @Override
>>>>> public NextAction handleRead(FilterChainContext ctx) throws
>>>>> IOException {
>>>>>
>>>>> Buffer messageBuffer = ctx.getMessage();
>>>>> if (messageBuffer == null) {
>>>>> return ctx.getStopAction();
>>>>> }
>>>>>
>>>>> if (!isAllFragmentsArrived(messageBuffer)) {
>>>>> return ctx.getStopAction(messageBuffer);
>>>>> }
>>>>>
>>>>> ctx.setMessage(assembleXdr(messageBuffer));
>>>>>
>>>>> final Buffer reminder = messageBuffer.hasRemaining()
>>>>> ? messageBuffer.split(messageBuffer.position()) :
>>>>> null;
>>>>>
>>>>> return ctx.getInvokeAction(reminder);
>>>>> }
>>>>>
>>>>>
>>>>> Up to now I was sure that is there are more data to process (reminder
>>>>> != null) or more incoming data available grizzly will process it.
>>>> Right, it will, but only after current filterchain processing is
>>>> finished.
>>>>
>>>>
>>>>
>>>>> The difference between SamaThread and WorkerThread strategies is only in
>>>>> the way of processing.
>>>> The diff. is that SameThreadStrategy will do FilterChain processing in
>>>> the
>>>> Grizzly core thread (selector thread for NIO), and WorkerThreadStrategy
>>>> will
>>>> run FilterChain processing in Transport's worker thread.
>>>>
>>>>
>>>>> For example it for some reason processing takes
>>>>> too long, the I can read the message and drop in without processing as
>>>>> client will retry any will ignore late reply.
>>>> If that's the case - then pls. try the approach i suggested in prev.
>>>> email.
>>>>
>>>> WBR,
>>>> Alexey.
>>>>
>>>>
>>>>> Anyway, now I now that this was a wrong assumption.
>>>>>
>>>>> Tigran.
>>>>>
>>>>>
>>>>>> I'm not saying this will work faster, but it will really parallelize
>>>>>> request
>>>>>> processing.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> WBR,
>>>>>> Alexey.
>>>>>>
>>>>>>> Regads,
>>>>>>> Tigran.
>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> WBR,
>>>>>>>> Alexey.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/22/2012 02:16 PM, Tigran Mkrtchyan wrote:
>>>>>>>>> Hi Alexey,
>>>>>>>>>
>>>>>>>>> On Wed, Aug 22, 2012 at 11:37 AM, Oleksiy Stashok
>>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>> Hi Tigran,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 08/22/2012 08:03 AM, Tigran Mkrtchyan wrote:
>>>>>>>>>>> The result is performs 2.1 ~15% better than 2.2 and 2.3:
>>>>>>>>>>>
>>>>>>>>>>> grizzly 2.2 http://hammercloud.cern.ch/hc/app/atlas/test/20009667/
>>>>>>>>>>> girzzly 2.3 http://hammercloud.cern.ch/hc/app/atlas/test/20009668/
>>>>>>>>>>> grizzly 2.1 http://hammercloud.cern.ch/hc/app/atlas/test/20009669/
>>>>>>>>>>>
>>>>>>>>>>> (look at the mean of most right graph)
>>>>>>>>>>>
>>>>>>>>>>> The only difference in the code is attached. May be problem is
>>>>>>>>>>> there.
>>>>>>>>>> Thank you for the info. I don't see any problem in your code. Just
>>>>>>>>>> FYI
>>>>>>>>>> we're
>>>>>>>>>> deprecating PushBack mechanism in Grizzly 2.3 (will remove in
>>>>>>>>>> Grizzly
>>>>>>>>>> 3).
>>>>>>>>>> It
>>>>>>>>>> would be still possible to check async write queue status, if it's
>>>>>>>>>> overloaded or not, register a listener, which will be notified once
>>>>>>>>>> you
>>>>>>>>>> can
>>>>>>>>>> write.... But the important difference is that async write queue
>>>>>>>>>> will
>>>>>>>>>> keep
>>>>>>>>>> accepting data even if it's overloaded (no exception thrown). We
>>>>>>>>>> think
>>>>>>>>>> this
>>>>>>>>>> behavior is easier to implement on Grizzly side (it will perform
>>>>>>>>>> better)
>>>>>>>>>> and
>>>>>>>>>> it actually offers same/similar functionality as push back
>>>>>>>>>> mechanism.
>>>>>>>>> I have added push-back handler as we start to see rejections. It's
>>>>>>>>> good to hear the you will push a default implementation into async
>>>>>>>>> write queue.
>>>>>>>>>
>>>>>>>>>> As for the results you ran, wanted to check
>>>>>>>>>>
>>>>>>>>>> 1) All three runs use the same I/O strategy (WorkerThreadStrategy)
>>>>>>>>>> ?
>>>>>>>>> Yes, the only difference is the patch which was attached.
>>>>>>>>>
>>>>>>>>>> 2) Are you configuring Grizzly worker thread pool in your code?
>>>>>>>>> No we use what ever grizzly has by default.
>>>>>>>>>
>>>>>>>>>> I'll run simple echo test and will try to reproduce the problem,
>>>>>>>>>> will
>>>>>>>>>> let
>>>>>>>>>> you know.
>>>>>>>>> Just to remind you. The client are 16 physical hosts doing NFS IO to
>>>>>>>>> a
>>>>>>>>> single server.
>>>>>>>>> Each client may ( and does ) send 16 (just a coincidence with number
>>>>>>>>> of hosts ) request. In total server have to process 256 requests at
>>>>>>>>> any point and reply with 16x1MB messages.
>>>>>>>>> The server host has 24 cores and 32 GB or RAM.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Tigran.
>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> WBR,
>>>>>>>>>> Alexey.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Tigran.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 17, 2012 at 4:19 PM, Tigran Mkrtchyan
>>>>>>>>>>> <tigran.mkrtchyan_at_desy.de> wrote:
>>>>>>>>>>>> Hi Alexey,
>>>>>>>>>>>>
>>>>>>>>>>>> We had SameThreadStrategy. Now I switched to
>>>>>>>>>>>> WorkerThreadStratedy.
>>>>>>>>>>>>
>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Aug 17, 2012 at 4:12 PM, Oleksiy Stashok
>>>>>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>>>>> Hi Tigran,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks a lot for the info! Would be great if you can confirm
>>>>>>>>>>>>> these
>>>>>>>>>>>>> results
>>>>>>>>>>>>> next week.
>>>>>>>>>>>>> Just interesting, are you using SameThreadStrategy in your
>>>>>>>>>>>>> tests?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> WBR,
>>>>>>>>>>>>> Alexey.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 08/17/2012 03:31 PM, Tigran Mkrtchyan wrote:
>>>>>>>>>>>>>> Hi Alexey,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the 2.3-SNAPSHOT is comparable with 2.1.11.
>>>>>>>>>>>>>> The 2.2.9 is ~5% slower in my simple test. We can run
>>>>>>>>>>>>>> more production level tests next week as they take ~5-6 hours
>>>>>>>>>>>>>> per
>>>>>>>>>>>>>> run
>>>>>>>>>>>>>> and require special setup.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Aug 17, 2012 at 2:10 PM, Tigran Mkrtchyan
>>>>>>>>>>>>>> <tigran.mkrtchyan_at_desy.de> wrote:
>>>>>>>>>>>>>>> NP. give me an hour.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Aug 17, 2012 at 12:46 PM, Oleksiy Stashok
>>>>>>>>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>>>>>>>> Tigran, before you test (if you planned) releases (2.1.7;
>>>>>>>>>>>>>>>> 2.2.9),
>>>>>>>>>>>>>>>> wanted
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> ask if you can try Grizzly 2.3-SNAPSHOT first?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> WBR,
>>>>>>>>>>>>>>>> Alexey.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 08/17/2012 11:51 AM, Oleksiy Stashok wrote:
>>>>>>>>>>>>>>>>> Hi Tigran,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> thank you for the info.
>>>>>>>>>>>>>>>>> We'll investigate that!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We'll appreciate any help, which will let us narrow down the
>>>>>>>>>>>>>>>>> problem,
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> example if you have some time to try another releases
>>>>>>>>>>>>>>>>> 2.1.7<release<2.2.9 it
>>>>>>>>>>>>>>>>> would help a lot.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> WBR,
>>>>>>>>>>>>>>>>> Alexey.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 08/17/2012 11:36 AM, Tigran Mkrtchyan wrote:
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> After lot of time spent on debugging we found that the
>>>>>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>> grizzly-2.1.7 to grizzly-2.2.9
>>>>>>>>>>>>>>>>>> drop performance of out server by 10%.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> the profiling results can be found at:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://www.dcache.org/grizzly-2-1.xml
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> http://www.dcache.org/grizzly-2-2.xml
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> we run the same application against the server (just to
>>>>>>>>>>>>>>>>>> remind,
>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>> is a NFSv4.1 server written in java).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Let me know if you need more info.
>>>>>>>>>>>>>>>>>> For now we will rollback to 2.1.7 version.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>>>>>>