dev@grizzly.java.net

Re: grizzly 2.2 performance issue

From: Tigran Mkrtchyan <tigran.mkrtchyan_at_desy.de>
Date: Tue, 11 Sep 2012 15:18:51 +0200

Hi Oleksiy,

thanks, I will try new code and option.

BTW, are you going to JavaOne? I will be in Sunnyvale the same week (
October 1-5) and can show up.

Tigran.

On Tue, Sep 11, 2012 at 2:59 PM, Oleksiy Stashok
<oleksiy.stashok_at_oracle.com> wrote:
> Hi Tigran,
>
> we've just released Grizzly 2.3-beta3 w/ some perf. fixes related to
> WorkerThread strategy.
> We actually changes the thread-pool queue implementation LinkedTransferQueue
> -> LinkedBlockingQueue (looks like LinkedTransferQueue is spinning too
> much).
>
> I will send you a patch for 2.3-beta3 separately, which has a bit different
> impl. of socket.read(...), would be interesting to see if it makes any
> difference for you.
>
> One more thing you can try (once you use WorkerThread) - use
> TCPNIOTransport's optimizeForMulteplexing mode, which you can set via
> tcpNioTransport instance like:
> tcpNioTransport.setOptimizeForMultiplexing(true);
>
> Pls. let me know if this will change the bm results.
>
> Thanks.
>
> WBR,
> Alexey.
>
>
> On 09/10/2012 03:24 PM, Tigran Mkrtchyan wrote:
>>
>> Hi Oleksiy,
>>
>> finally I gor some time to test 2.2.19.
>>
>> Here what I get:
>>
>> org.glassfish.grizzly.PendingWriteQueueLimitExceededException: Max
>> queued data limit exceeded: 2999452>47440
>> at
>> org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.checkQueueSize(AbstractNIOAsyncQueueWriter.java:619)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.writeQueueRecord(AbstractNIOAsyncQueueWriter.java:279)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.glassfish.grizzly.nio.AbstractNIOAsyncQueueWriter.write(AbstractNIOAsyncQueueWriter.java:219)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.glassfish.grizzly.nio.transport.TCPNIOTransportFilter.handleWrite(TCPNIOTransportFilter.java:127)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.glassfish.grizzly.filterchain.TransportFilter.handleWrite(TransportFilter.java:191)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:265)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:200)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:134)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:78)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:652)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:568)
>> [grizzly-framework-2.2.19-SNAPSHOT.jar:2.2.19-SNAPSHOT]
>> at
>> org.dcache.xdr.GrizzlyXdrTransport.send(GrizzlyXdrTransport.java:47)
>> [xdr-2.4.0-SNAPSHOT.jar:2.4.0-SNAPSHOT]
>>
>>
>> Shall I add a push back handler to retry write operation?
>>
>> Tigran
>>
>> On Thu, Aug 23, 2012 at 3:37 PM, Oleksiy Stashok
>> <oleksiy.stashok_at_oracle.com> wrote:
>>>
>>> Hi Tigran,
>>>
>>>> There is the fragment of the handler code ( to complete code can be
>>>> found at
>>>>
>>>> http://code.google.com/p/nio-jrpc/source/browse/oncrpc4j-core/src/main/java/org/dcache/xdr/RpcMessageParserTCP.java):
>>>>
>>>> @Override
>>>> public NextAction handleRead(FilterChainContext ctx) throws
>>>> IOException {
>>>>
>>>> Buffer messageBuffer = ctx.getMessage();
>>>> if (messageBuffer == null) {
>>>> return ctx.getStopAction();
>>>> }
>>>>
>>>> if (!isAllFragmentsArrived(messageBuffer)) {
>>>> return ctx.getStopAction(messageBuffer);
>>>> }
>>>>
>>>> ctx.setMessage(assembleXdr(messageBuffer));
>>>>
>>>> final Buffer reminder = messageBuffer.hasRemaining()
>>>> ? messageBuffer.split(messageBuffer.position()) :
>>>> null;
>>>>
>>>> return ctx.getInvokeAction(reminder);
>>>> }
>>>>
>>>>
>>>> Up to now I was sure that is there are more data to process (reminder
>>>> != null) or more incoming data available grizzly will process it.
>>>
>>> Right, it will, but only after current filterchain processing is
>>> finished.
>>>
>>>
>>>
>>>> The difference between SamaThread and WorkerThread strategies is only in
>>>> the way of processing.
>>>
>>> The diff. is that SameThreadStrategy will do FilterChain processing in
>>> the
>>> Grizzly core thread (selector thread for NIO), and WorkerThreadStrategy
>>> will
>>> run FilterChain processing in Transport's worker thread.
>>>
>>>
>>>> For example it for some reason processing takes
>>>> too long, the I can read the message and drop in without processing as
>>>> client will retry any will ignore late reply.
>>>
>>> If that's the case - then pls. try the approach i suggested in prev.
>>> email.
>>>
>>> WBR,
>>> Alexey.
>>>
>>>
>>>> Anyway, now I now that this was a wrong assumption.
>>>>
>>>> Tigran.
>>>>
>>>>
>>>>> I'm not saying this will work faster, but it will really parallelize
>>>>> request
>>>>> processing.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> WBR,
>>>>> Alexey.
>>>>>
>>>>>> Regads,
>>>>>> Tigran.
>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> WBR,
>>>>>>> Alexey.
>>>>>>>
>>>>>>>
>>>>>>> On 08/22/2012 02:16 PM, Tigran Mkrtchyan wrote:
>>>>>>>>
>>>>>>>> Hi Alexey,
>>>>>>>>
>>>>>>>> On Wed, Aug 22, 2012 at 11:37 AM, Oleksiy Stashok
>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Tigran,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 08/22/2012 08:03 AM, Tigran Mkrtchyan wrote:
>>>>>>>>>>
>>>>>>>>>> The result is performs 2.1 ~15% better than 2.2 and 2.3:
>>>>>>>>>>
>>>>>>>>>> grizzly 2.2 http://hammercloud.cern.ch/hc/app/atlas/test/20009667/
>>>>>>>>>> girzzly 2.3 http://hammercloud.cern.ch/hc/app/atlas/test/20009668/
>>>>>>>>>> grizzly 2.1 http://hammercloud.cern.ch/hc/app/atlas/test/20009669/
>>>>>>>>>>
>>>>>>>>>> (look at the mean of most right graph)
>>>>>>>>>>
>>>>>>>>>> The only difference in the code is attached. May be problem is
>>>>>>>>>> there.
>>>>>>>>>
>>>>>>>>> Thank you for the info. I don't see any problem in your code. Just
>>>>>>>>> FYI
>>>>>>>>> we're
>>>>>>>>> deprecating PushBack mechanism in Grizzly 2.3 (will remove in
>>>>>>>>> Grizzly
>>>>>>>>> 3).
>>>>>>>>> It
>>>>>>>>> would be still possible to check async write queue status, if it's
>>>>>>>>> overloaded or not, register a listener, which will be notified once
>>>>>>>>> you
>>>>>>>>> can
>>>>>>>>> write.... But the important difference is that async write queue
>>>>>>>>> will
>>>>>>>>> keep
>>>>>>>>> accepting data even if it's overloaded (no exception thrown). We
>>>>>>>>> think
>>>>>>>>> this
>>>>>>>>> behavior is easier to implement on Grizzly side (it will perform
>>>>>>>>> better)
>>>>>>>>> and
>>>>>>>>> it actually offers same/similar functionality as push back
>>>>>>>>> mechanism.
>>>>>>>>
>>>>>>>> I have added push-back handler as we start to see rejections. It's
>>>>>>>> good to hear the you will push a default implementation into async
>>>>>>>> write queue.
>>>>>>>>
>>>>>>>>> As for the results you ran, wanted to check
>>>>>>>>>
>>>>>>>>> 1) All three runs use the same I/O strategy (WorkerThreadStrategy)
>>>>>>>>> ?
>>>>>>>>
>>>>>>>> Yes, the only difference is the patch which was attached.
>>>>>>>>
>>>>>>>>> 2) Are you configuring Grizzly worker thread pool in your code?
>>>>>>>>
>>>>>>>> No we use what ever grizzly has by default.
>>>>>>>>
>>>>>>>>> I'll run simple echo test and will try to reproduce the problem,
>>>>>>>>> will
>>>>>>>>> let
>>>>>>>>> you know.
>>>>>>>>
>>>>>>>> Just to remind you. The client are 16 physical hosts doing NFS IO to
>>>>>>>> a
>>>>>>>> single server.
>>>>>>>> Each client may ( and does ) send 16 (just a coincidence with number
>>>>>>>> of hosts ) request. In total server have to process 256 requests at
>>>>>>>> any point and reply with 16x1MB messages.
>>>>>>>> The server host has 24 cores and 32 GB or RAM.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Tigran.
>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> WBR,
>>>>>>>>> Alexey.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Tigran.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Aug 17, 2012 at 4:19 PM, Tigran Mkrtchyan
>>>>>>>>>> <tigran.mkrtchyan_at_desy.de> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Alexey,
>>>>>>>>>>>
>>>>>>>>>>> We had SameThreadStrategy. Now I switched to
>>>>>>>>>>> WorkerThreadStratedy.
>>>>>>>>>>>
>>>>>>>>>>> Tigran.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Aug 17, 2012 at 4:12 PM, Oleksiy Stashok
>>>>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Tigran,
>>>>>>>>>>>>
>>>>>>>>>>>> thanks a lot for the info! Would be great if you can confirm
>>>>>>>>>>>> these
>>>>>>>>>>>> results
>>>>>>>>>>>> next week.
>>>>>>>>>>>> Just interesting, are you using SameThreadStrategy in your
>>>>>>>>>>>> tests?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> WBR,
>>>>>>>>>>>> Alexey.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 08/17/2012 03:31 PM, Tigran Mkrtchyan wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Alexey,
>>>>>>>>>>>>>
>>>>>>>>>>>>> the 2.3-SNAPSHOT is comparable with 2.1.11.
>>>>>>>>>>>>> The 2.2.9 is ~5% slower in my simple test. We can run
>>>>>>>>>>>>> more production level tests next week as they take ~5-6 hours
>>>>>>>>>>>>> per
>>>>>>>>>>>>> run
>>>>>>>>>>>>> and require special setup.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Aug 17, 2012 at 2:10 PM, Tigran Mkrtchyan
>>>>>>>>>>>>> <tigran.mkrtchyan_at_desy.de> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> NP. give me an hour.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Aug 17, 2012 at 12:46 PM, Oleksiy Stashok
>>>>>>>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Tigran, before you test (if you planned) releases (2.1.7;
>>>>>>>>>>>>>>> 2.2.9),
>>>>>>>>>>>>>>> wanted
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> ask if you can try Grizzly 2.3-SNAPSHOT first?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> WBR,
>>>>>>>>>>>>>>> Alexey.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 08/17/2012 11:51 AM, Oleksiy Stashok wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Tigran,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> thank you for the info.
>>>>>>>>>>>>>>>> We'll investigate that!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We'll appreciate any help, which will let us narrow down the
>>>>>>>>>>>>>>>> problem,
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> example if you have some time to try another releases
>>>>>>>>>>>>>>>> 2.1.7<release<2.2.9 it
>>>>>>>>>>>>>>>> would help a lot.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> WBR,
>>>>>>>>>>>>>>>> Alexey.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 08/17/2012 11:36 AM, Tigran Mkrtchyan wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> After lot of time spent on debugging we found that the
>>>>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> grizzly-2.1.7 to grizzly-2.2.9
>>>>>>>>>>>>>>>>> drop performance of out server by 10%.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> the profiling results can be found at:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://www.dcache.org/grizzly-2-1.xml
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> http://www.dcache.org/grizzly-2-2.xml
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> we run the same application against the server (just to
>>>>>>>>>>>>>>>>> remind,
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> is a NFSv4.1 server written in java).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let me know if you need more info.
>>>>>>>>>>>>>>>>> For now we will rollback to 2.1.7 version.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Tigran.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>