[Jersey] Re: Jesey non-blocking client

From: Igor Skornyakov <igor.skornyakov_at_gmail.com>
Date: Thu, 19 Jan 2012 16:01:59 +0400

On Thu, Jan 19, 2012 at 3:11 PM, Oleksiy Stashok <oleksiy.stashok_at_oracle.com
> wrote:

> On 01/19/2012 11:40 AM, Igor Skornyakov wrote:
>
> Well, actually it is a stress test for some pretty big commercial
> application. (The server uses Jersey 1.10 with embedded Grizzly servlet
> container ). I'll try to reproduce it independently, but it can take some
> time.
>
> About a server behavior - currently I run all the components (client,
> server, database and Terracotta) on a single 4-core machine so it doesn't
> surprise me that server doesn't wait while a slow client grabs the response
> and closes the connection.
>
> The following example illustrates the timing:
> 2012-01-19 12:37:26,930 - the client sends POST request
> 2012-01-19 12:37:37,634 - the client got an IOException
> 2012-01-19 12:37:37,634 - the client resends the request
> 2012-01-19 12:37:41,601 - the server (more exactly - the logging filter at
> the server side) receives the first request
> 2012-01-19 12:37:41,772 - the server sends the response to the first
> request (200)
> 2012-01-19 12:38:14,217 - the server receives the second request
> 2012-01-19 12:38:14,339 - the server rejects the second requests (409)
>
> According to the sample above, looks like narrow place is the server's
> thread-pool (at least it's my guess). Looks like client request sits in the
> server's thread-pool queue for 15 seconds.
> Is it possible that server thread-pool (all the available threads) is
> getting blocked by some requests? For example during app<->DB
> communication, so newly coming tasks are getting queued... and get
> processed w/ 15 seconds delay?
>
>
That's a good idea - thanks a lot! I will try to increase the size the
listener's thread pool and see how it affects the behaviour. From the other
side the average waiting time for db connection is ~4ms (as BoneCP reports
it) so the database doesn't look like a bottleneck. According do my
internal metrics >80% of REST requests a processed in less than 1ms (of
course the time is measured between invocations of corresponding
request/response filters) and only <5% take more than 100ms. Unfortunately
the current version of the test does not measure the latency/throughput
from the client side - I plan to add it in the nearest future.

>
> In fact it was my objective to produce a really heavy load (the previous
> version of the test which used synchronous client have experienced network
> problems very rarely - ones a week or so)
>
> Hmm, have you sent the same number of requests simultaneously as you do
> using async http client?
>

Well, as I wrote before I do not have exact numbers at this moment. The
impression is that the new client sends more requests per substantial time
(a few minutes) than its synchronous version, but not by an order of
magnitude (may be 2-3 times more - at least with my current deployment).
From the other side there are much more bursts in the load - the test
simulates the activity of ~1200 users who are busy with
creating/starting/stopping/deleting their VMs in the cloud and with
asynchronous API this activity can be made much more intensive (with sync
version the simulated delays between steps could not be really enforced as
all working threads spent a lot of time waiting for a response).

Thanks again,
Igor.

>
> Thanks.
>
> WBR,
> Alexey.
>
>
>
> From the other side I do agree with you - I have to think more about the
> server side of the drama.
> Thank you,
> Igor.
>
>
> On Thu, Jan 19, 2012 at 2:14 PM, Oleksiy Stashok <
> oleksiy.stashok_at_oracle.com> wrote:
>
>> Hi Igor,
>>
>>
>>
>> I've tried 1.12-SNAPSHOT again and encountered a lot of IOExceptions with
>> the "An existing connection was forcibly closed by the remote host"
>> message. I do have such exceptions with version 1.11 but far less of them.
>> It is not a showstopper in my case - just irritating. It also seems that
>> the performance is not as good as with 1.11 (not sure about this - just an
>> impression).
>>
>> Hmm, interesting, in 1.12-SNAPSHOT Jersey started to use Grizzly Async
>> HTTP provider, though I'd not compare performance at this point, let's fix
>> the problem first :)
>>
>>
>> BTW: I've discovered that AHC-1.65 indeed performs requests retry - see
>> NettyAsyncHttpProvider.remotelyClosed() method. It is possible of course
>> use "com.ning.http.client.providers.netty.maxRetry" Java system property
>> or AsyncHttpClientConfigBean to suppress this behavior, but this will
>> affect all HTTP methods (including idempotent ones where automatic retry
>> can be acceptable or even desirable). May be it makes sense just to change
>>
>> NettyResponseFuture.canRetry() method and suppress retrying for
>> non-idempotent requests regardless of the value of "maxRetry"?
>>
>> I'd rather check, why connection is getting prematurely closed on server
>> side.
>> Can you pls. send some testcase?
>>
>> Thanks.
>>
>> WBR,
>> Alexey.
>>
>>
>>
>>
>> On Wed, Jan 18, 2012 at 5:09 PM, Igor Skornyakov <
>> igor.skornyakov_at_gmail.com> wrote:
>>
>>> Thank you Alexey,
>>> I will try again. Please note however that the same error message
>>> appears with JDK 1.7.0_02 as well.
>>> Regards,
>>> Igor.
>>>
>>>
>>>
>>> On Wed, Jan 18, 2012 at 5:04 PM, Oleksiy Stashok <
>>> oleksiy.stashok_at_oracle.com> wrote:
>>>
>>>> Hi Igor,
>>>>
>>>> don't scare, it's probably related to JDK issue [1].
>>>> We should probably change log level to FINE(R) to not scare people :)
>>>>
>>>> Otherwise this warning shouldn't impact functionality.
>>>>
>>>> WBR,
>>>> Alexey.
>>>>
>>>> [1] http://bugs.sun.com/view_bug.do?bug_id=6799574
>>>>
>>>>
>>>> On 01/18/2012 01:48 PM, Igor Skornyakov wrote:
>>>>
>>>> Thank you Pavel,
>>>> I've already tried 1.12-SNAPSHOT, but I was scared by a lot of messages
>>>> in the log like the following one
>>>>
>>>> 18.01.2012 16:45:58 org.glassfish.grizzly.nio.transport.TCPNIOTransport
>>>> configureChannel
>>>> WARNING: Can not set keepAlive to false
>>>> java.net.SocketException: Invalid argument: sun.nio.ch.Net.setIntOption
>>>> at sun.nio.ch.Net.setIntOption0(Native Method)
>>>> at sun.nio.ch.Net.setIntOption(Net.java:157)
>>>> at sun.nio.ch.SocketChannelImpl$1.setInt(SocketChannelImpl.java:399)
>>>> at sun.nio.ch.SocketOptsImpl.setBoolean(SocketOptsImpl.java:38)
>>>> at sun.nio.ch.SocketOptsImpl.keepAlive(SocketOptsImpl.java:92)
>>>> at
>>>> sun.nio.ch.SocketOptsImpl$IP$TCP.keepAlive(SocketOptsImpl.java:268)
>>>> at sun.nio.ch.OptionAdaptor.setKeepAlive(OptionAdaptor.java:139)
>>>> at sun.nio.ch.SocketAdaptor.setKeepAlive(SocketAdaptor.java:322)
>>>> at
>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransport.configureChannel(TCPNIOTransport.java:640)
>>>> at
>>>> org.glassfish.grizzly.nio.transport.TCPNIOConnectorHandler.onConnectedAsync(TCPNIOConnectorHandler.java:195)
>>>> at
>>>> org.glassfish.grizzly.nio.transport.TCPNIOConnectorHandler$1.completed(TCPNIOConnectorHandler.java:140)
>>>> at
>>>> org.glassfish.grizzly.nio.transport.TCPNIOConnectorHandler$1.completed(TCPNIOConnectorHandler.java:136)
>>>> at
>>>> org.glassfish.grizzly.nio.transport.TCPNIOConnection.onConnect(TCPNIOConnection.java:200)
>>>> at
>>>> org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:805)
>>>> at
>>>> org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:111)
>>>> at
>>>> org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:115)
>>>> at
>>>> org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.executeIoEvent(WorkerThreadIOStrategy.java:101)
>>>> at
>>>> org.glassfish.grizzly.strategies.AbstractIOStrategy.executeIoEvent(AbstractIOStrategy.java:87)
>>>> at
>>>> org.glassfish.grizzly.nio.SelectorRunner.iterateKeyEvents(SelectorRunner.java:401)
>>>> at
>>>> org.glassfish.grizzly.nio.SelectorRunner.iterateKeys(SelectorRunner.java:371)
>>>> at
>>>> org.glassfish.grizzly.nio.SelectorRunner.doSelect(SelectorRunner.java:337)
>>>> at
>>>> org.glassfish.grizzly.nio.SelectorRunner.run(SelectorRunner.java:267)
>>>> at
>>>> org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:566)
>>>> at
>>>> org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:546)
>>>> at java.lang.Thread.run(Thread.java:662)
>>>> (it was with jdk 1.6.0_29).
>>>> Regards,
>>>> Igor.
>>>>
>>>>
>>>> On Wed, Jan 18, 2012 at 4:37 PM, Pavel Bucek <pavel.bucek_at_oracle.com>wrote:
>>>>
>>>>> Hi Igor,
>>>>>
>>>>> I'm not aware of any retrying logic there. We recently changed
>>>>> underlying implementation which might fix this issue - can you please try
>>>>> recent build (version 1.12-SNAPSHOT)?
>>>>>
>>>>> Regards,
>>>>> Pavel
>>>>>
>>>>>
>>>>> On 1/18/12 12:56 PM, Igor Skornyakov wrote:
>>>>>
>>>>>> Hello,
>>>>>> I'm migrating a stress test for my REST application to Jersey
>>>>>> non-blocking client version 1.11. Everything works fine with one problem -
>>>>>> some requests are sent twice and I'm pretty sure that my code does not do
>>>>>> that (I have some retry logic but in such cases there is a corresponding
>>>>>> message in a log). If it is a feature of client API how can I disable it
>>>>>> (at least for POSTs)? I see in the server log both requests - the first is
>>>>>> accepted and a valid response has been return returned, the second one -
>>>>>> rejected (which is correct in my case). However at the client side I
>>>>>> receive only the second (error) response.
>>>>>>
>>>>>> Thank you,
>>>>>> Igor.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>

users@jersey.java.net

[Jersey] Re: Jesey non-blocking client - request retry