dev@grizzly.java.net

Re: grizzly 2.2 performance issue

From: Tigran Mkrtchyan <tigran.mkrtchyan_at_desy.de>
Date: Tue, 11 Sep 2012 15:37:36 +0200

On Tue, Sep 11, 2012 at 3:24 PM, Oleksiy Stashok
<oleksiy.stashok_at_oracle.com> wrote:
> Hi Tigran,
>
>
> On 09/11/2012 03:18 PM, Tigran Mkrtchyan wrote:
>>
>> BTW, are you going to JavaOne? I will be in Sunnyvale the same week (
>> October 1-5) and can show up.
>
> Unfortunately not this year :(
> But as I understand Prague is not far from you, so it might be considered as
> an option :)

Cool! Then you can visit us!

Tigran.
>
>
> WBR,
> Alexey.
>
>
>>
>> Tigran.
>>
>> On Tue, Sep 11, 2012 at 2:59 PM, Oleksiy Stashok
>> <oleksiy.stashok_at_oracle.com> wrote:
>>>
>>> Hi Tigran,
>>>
>>> we've just released Grizzly 2.3-beta3 w/ some perf. fixes related to
>>> WorkerThread strategy.
>>> We actually changes the thread-pool queue implementation
>>> LinkedTransferQueue
>>> -> LinkedBlockingQueue (looks like LinkedTransferQueue is spinning too
>>> much).
>>>
>>> I will send you a patch for 2.3-beta3 separately, which has a bit
>>> different
>>> impl. of socket.read(...), would be interesting to see if it makes any
>>> difference for you.
>>>
>>> One more thing you can try (once you use WorkerThread) - use
>>> TCPNIOTransport's optimizeForMulteplexing mode, which you can set via
>>> tcpNioTransport instance like:
>>> tcpNioTransport.setOptimizeForMultiplexing(true);
>>>
>>> Pls. let me know if this will change the bm results.
>>>
>>> Thanks.
>>>
>>> WBR,
>>> Alexey.
>>>
>>>
>>> On 09/10/2012 03:24 PM, Tigran Mkrtchyan wrote:
>>>>
>>>> Hi Oleksiy,
>>>>
>>>> finally I gor some time to test 2.2.19.
>>>>
>>>> Here what I get:
>>>>
>>>> org.glassfish.grizzly.PendingWriteQueueLimitExceededException: Max
>>>> queued data limit exceeded: 2999452>47440
>>>> at
>>>>
>>>> org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.checkQueueSize(AbstractNIOAsyncQueueWriter.java:619)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>>
>>>> org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.writeQueueRecord(AbstractNIOAsyncQueueWriter.java:279)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>>
>>>> org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.write(AbstractNIOAsyncQueueWriter.java:219)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>>
>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransportFilter.handleWrite(TCPNIOTransportFilter.java:127)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>>
>>>> org.glassfish.grizzly.filterchain.TransportFilter.handleWrite(TransportFilter.java:191)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>>
>>>> org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>>
>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:265)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>>
>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:200)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>>
>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:134)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>>
>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>>
>>>> org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:78)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>>
>>>> org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:652)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>>
>>>> org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:568)
>>>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>>>> at
>>>> org.dcache.xdr.GrizzlyXdrTransport.send(GrizzlyXdrTransport.java:47)
>>>> [xdr-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT]
>>>>
>>>>
>>>> Shall I add a push back handler to retry write operation?
>>>>
>>>> Tigran
>>>>
>>>> On Thu, Aug 23, 2012 at 3:37 PM, Oleksiy Stashok
>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>
>>>>> Hi Tigran,
>>>>>
>>>>>> There is the fragment of the handler code ( to complete code can be
>>>>>> found at
>>>>>>
>>>>>>
>>>>>> http://code.google.com/p/nio-jrpc/source/browse/oncrpc4j-core/src/main/java/org/dcache/xdr/RpcMessageParserTCP.java):
>>>>>>
>>>>>> @Override
>>>>>> public NextAction handleRead(FilterChainContext ctx) throws
>>>>>> IOException {
>>>>>>
>>>>>> Buffer messageBuffer = ctx.getMessage();
>>>>>> if (messageBuffer == null) {
>>>>>> return ctx.getStopAction();
>>>>>> }
>>>>>>
>>>>>> if (!isAllFragmentsArrived(messageBuffer)) {
>>>>>> return ctx.getStopAction(messageBuffer);
>>>>>> }
>>>>>>
>>>>>> ctx.setMessage(assembleXdr(messageBuffer));
>>>>>>
>>>>>> final Buffer reminder = messageBuffer.hasRemaining()
>>>>>> ? messageBuffer.split(messageBuffer.position()) :
>>>>>> null;
>>>>>>
>>>>>> return ctx.getInvokeAction(reminder);
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Up to now I was sure that is there are more data to process (reminder
>>>>>> != null) or more incoming data available grizzly will process it.
>>>>>
>>>>> Right, it will, but only after current filterchain processing is
>>>>> finished.
>>>>>
>>>>>
>>>>>
>>>>>> The difference between SamaThread and WorkerThread strategies is only
>>>>>> in
>>>>>> the way of processing.
>>>>>
>>>>> The diff. is that SameThreadStrategy will do FilterChain processing in
>>>>> the
>>>>> Grizzly core thread (selector thread for NIO), and WorkerThreadStrategy
>>>>> will
>>>>> run FilterChain processing in Transport's worker thread.
>>>>>
>>>>>
>>>>>> For example it for some reason processing takes
>>>>>> too long, the I can read the message and drop in without processing as
>>>>>> client will retry any will ignore late reply.
>>>>>
>>>>> If that's the case - then pls. try the approach i suggested in prev.
>>>>> email.
>>>>>
>>>>> WBR,
>>>>> Alexey.
>>>>>
>>>>>
>>>>>> Anyway, now I now that this was a wrong assumption.
>>>>>>
>>>>>> Tigran.
>>>>>>
>>>>>>
>>>>>>> I'm not saying this will work faster, but it will really parallelize
>>>>>>> request
>>>>>>> processing.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> WBR,
>>>>>>> Alexey.
>>>>>>>
>>>>>>>> Regads,
>>>>>>>> Tigran.
>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> WBR,
>>>>>>>>> Alexey.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 08/22/2012 02:16 PM, Tigran Mkrtchyan wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Alexey,
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 22, 2012 at 11:37 AM, Oleksiy Stashok
>>>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Tigran,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 08/22/2012 08:03 AM, Tigran Mkrtchyan wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> The result is performs 2.1 ~15% better than 2.2 and 2.3:
>>>>>>>>>>>>
>>>>>>>>>>>> grizzly 2.2
>>>>>>>>>>>> http://hammercloud.cern.ch/hc/app/atlas/test/20009667/
>>>>>>>>>>>> girzzly 2.3
>>>>>>>>>>>> http://hammercloud.cern.ch/hc/app/atlas/test/20009668/
>>>>>>>>>>>> grizzly 2.1
>>>>>>>>>>>> http://hammercloud.cern.ch/hc/app/atlas/test/20009669/
>>>>>>>>>>>>
>>>>>>>>>>>> (look at the mean of most right graph)
>>>>>>>>>>>>
>>>>>>>>>>>> The only difference in the code is attached. May be problem is
>>>>>>>>>>>> there.
>>>>>>>>>>>
>>>>>>>>>>> Thank you for the info. I don't see any problem in your code.
>>>>>>>>>>> Just
>>>>>>>>>>> FYI
>>>>>>>>>>> we're
>>>>>>>>>>> deprecating PushBack mechanism in Grizzly 2.3 (will remove in
>>>>>>>>>>> Grizzly
>>>>>>>>>>> 3).
>>>>>>>>>>> It
>>>>>>>>>>> would be still possible to check async write queue status, if
>>>>>>>>>>> it's
>>>>>>>>>>> overloaded or not, register a listener, which will be notified
>>>>>>>>>>> once
>>>>>>>>>>> you
>>>>>>>>>>> can
>>>>>>>>>>> write.... But the important difference is that async write queue
>>>>>>>>>>> will
>>>>>>>>>>> keep
>>>>>>>>>>> accepting data even if it's overloaded (no exception thrown). We
>>>>>>>>>>> think
>>>>>>>>>>> this
>>>>>>>>>>> behavior is easier to implement on Grizzly side (it will perform
>>>>>>>>>>> better)
>>>>>>>>>>> and
>>>>>>>>>>> it actually offers same/similar functionality as push back
>>>>>>>>>>> mechanism.
>>>>>>>>>>
>>>>>>>>>> I have added push-back handler as we start to see rejections. It's
>>>>>>>>>> good to hear the you will push a default implementation into async
>>>>>>>>>> write queue.
>>>>>>>>>>
>>>>>>>>>>> As for the results you ran, wanted to check
>>>>>>>>>>>
>>>>>>>>>>> 1) All three runs use the same I/O strategy
>>>>>>>>>>> (WorkerThreadStrategy)
>>>>>>>>>>> ?
>>>>>>>>>>
>>>>>>>>>> Yes, the only difference is the patch which was attached.
>>>>>>>>>>
>>>>>>>>>>> 2) Are you configuring Grizzly worker thread pool in your code?
>>>>>>>>>>
>>>>>>>>>> No we use what ever grizzly has by default.
>>>>>>>>>>
>>>>>>>>>>> I'll run simple echo test and will try to reproduce the problem,
>>>>>>>>>>> will
>>>>>>>>>>> let
>>>>>>>>>>> you know.
>>>>>>>>>>
>>>>>>>>>> Just to remind you. The client are 16 physical hosts doing NFS IO
>>>>>>>>>> to
>>>>>>>>>> a
>>>>>>>>>> single server.
>>>>>>>>>> Each client may ( and does ) send 16 (just a coincidence with
>>>>>>>>>> number
>>>>>>>>>> of hosts ) request. In total server have to process 256 requests
>>>>>>>>>> at
>>>>>>>>>> any point and reply with 16x1MB messages.
>>>>>>>>>> The server host has 24 cores and 32 GB or RAM.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Tigran.
>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>> WBR,
>>>>>>>>>>> Alexey.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Aug 17, 2012 at 4:19 PM, Tigran Mkrtchyan
>>>>>>>>>>>> <tigran.mkrtchyan_at_desy.de> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Alexey,
>>>>>>>>>>>>>
>>>>>>>>>>>>> We had SameThreadStrategy. Now I switched to
>>>>>>>>>>>>> WorkerThreadStratedy.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Aug 17, 2012 at 4:12 PM, Oleksiy Stashok
>>>>>>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Tigran,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thanks a lot for the info! Would be great if you can confirm
>>>>>>>>>>>>>> these
>>>>>>>>>>>>>> results
>>>>>>>>>>>>>> next week.
>>>>>>>>>>>>>> Just interesting, are you using SameThreadStrategy in your
>>>>>>>>>>>>>> tests?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> WBR,
>>>>>>>>>>>>>> Alexey.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 08/17/2012 03:31 PM, Tigran Mkrtchyan wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Alexey,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> the 2.3-SNAPSHOT is comparable with 2.1.11.
>>>>>>>>>>>>>>> The 2.2.9 is ~5% slower in my simple test. We can run
>>>>>>>>>>>>>>> more production level tests next week as they take ~5-6 hours
>>>>>>>>>>>>>>> per
>>>>>>>>>>>>>>> run
>>>>>>>>>>>>>>> and require special setup.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Aug 17, 2012 at 2:10 PM, Tigran Mkrtchyan
>>>>>>>>>>>>>>> <tigran.mkrtchyan_at_desy.de> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> NP. give me an hour.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Aug 17, 2012 at 12:46 PM, Oleksiy Stashok
>>>>>>>>>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Tigran, before you test (if you planned) releases (2.1.7;
>>>>>>>>>>>>>>>>> 2.2.9),
>>>>>>>>>>>>>>>>> wanted
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> ask if you can try Grizzly 2.3-SNAPSHOT first?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> WBR,
>>>>>>>>>>>>>>>>> Alexey.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 08/17/2012 11:51 AM, Oleksiy Stashok wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Tigran,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> thank you for the info.
>>>>>>>>>>>>>>>>>> We'll investigate that!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We'll appreciate any help, which will let us narrow down
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> problem,
>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> example if you have some time to try another releases
>>>>>>>>>>>>>>>>>> 2.1.7<release<2.2.9 it
>>>>>>>>>>>>>>>>>> would help a lot.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> WBR,
>>>>>>>>>>>>>>>>>> Alexey.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 08/17/2012 11:36 AM, Tigran Mkrtchyan wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> After lot of time spent on debugging we found that the
>>>>>>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>> grizzly-2.1.7 to grizzly-2.2.9
>>>>>>>>>>>>>>>>>>> drop performance of out server by 10%.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> the profiling results can be found at:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> http://www.dcache.org/grizzly-2-1.xml
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> http://www.dcache.org/grizzly-2-2.xml
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> we run the same application against the server (just to
>>>>>>>>>>>>>>>>>>> remind,
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> is a NFSv4.1 server written in java).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Let me know if you need more info.
>>>>>>>>>>>>>>>>>>> For now we will rollback to 2.1.7 version.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>