Re: Optimal IOStrategy/ThreadPool configuration for proxy

From: Daniel Feist <dfeist_at_gmail.com>
Date: Thu, 16 Apr 2015 20:17:44 +0100

What I forgot to add is that I see the same issue with timeouts
between jmeter and the proxy even when "jmeter threads < selectors",
which kind of invalidates all of my ideas about selectors all
potentially being busy..

Wow, even with 1 thread it's occuring.. most be something stupid... I
don't think it's releated to persistent connections, maxKeepAlive on
target service is 100, which wouldn't explain rougly 1 in 2000
client-side timeout, especially given no errors are being logged.

On Thu, Apr 16, 2015 at 7:36 PM, Daniel Feist <dfeist_at_gmail.com> wrote:
> Hi,
>
> Nothing different really, just the blocking version returns response
> when the stack returns after waiting on outbound future returned by
> AHC, while the non-blocking version returns response when the
> completion handler passed to AHC is invoked. Ah, also blocking version
> using WorkerThreadIOStrategy while the non-blocking version uses
> SameThreadIOStrategy for inbound.
>
> I didn't reply earlier because I've been trying to get my head round
> whats going on. The errors are all timeout errors. Most of the
> timeout errors between jmeter and the proxy, but also some timeout
> errors between the proxy and target service, whereas with the blocking
> version there are no errors at all.
>
> Everything seems to be ok, and there are no exceptions being throw
> (other than timeouts) by grizzly/ahc. So my only hypothesis is that
> there is an issue with the selectors, either:
>
> i) for some reason selectors are blocking. (i see no evidence of this
> though, only thing i have between inbound and outbound is some copying
> of headers)
> ii) different number of inbound/outbound selectors could generate more
> inbound message than can be handled by outbound (i've ensured both
> have same number of selectors, and doesn't help, giving outbound more
> selectors than inbound seemed to improve things, but not solve the
> problem). BTW thought is what provoked my original email about shared
> transports/selectors.
> iii) By using dedicatedAcceptor the proxy is accepting all connection
> attempts immedialty, but a selector doesn't manage to handle read
> event before timeout is reached. (although changing this back to false
> didn't seem to help).
>
> I was initially testing with 4000 client threads, hitting proxy on
> 24-core machine which in turn hits an simple service with 5ms latency
> on another 24-core machine. But if I run with just 200 client threads
> I'm seeing the same :-(
>
> Last run i just did with concurrency of 200 gave 1159 errors, (6
> outbound timeouts and 1152 jmeter timeouts) in total of 4,154,978
> requests. It's only 0.03% but lot more than blocking, and no reason
> they should be happening.
>
> Any hints on where to look next would be greatly appreciated...
>
> thanks!
>
>
>
>
>
>
> On Wed, Apr 15, 2015 at 2:16 AM, Oleksiy Stashok
> <oleksiy.stashok_at_oracle.com> wrote:
>> What's the implementation diff of blocking vs. non-blocking? I mean is there
>> any change in your code?
>>
>> Thanks.
>>
>> WBR,
>> Alexey.
>>
>>
>> On 14.04.15 18:01, Daniel Feist wrote:
>>
>> Very interesting. My previuos tests had been with a simple inbound echo.
>> When testing with a non-blocking proxy (1Kb payload, 5ms target service
>> latency) optimizedForMultiplexing=false appears to give better TPS and
>> latency :-)
>>
>> Having some issues with non-blocking proxy in general though, getting decent
>> number of errors whereas in blocking mode get zero. Is it possible that
>> stale connections aren't handled in the same way, or is there something else
>> that might be causing this? I'll do some more digging around, but what I'm
>> seeing right now is 0.05% of jmeter client requests timing out after 60s.
>>
>> Dan
>>
>>
>>
>> On Tue, Apr 14, 2015 at 9:25 PM, Oleksiy Stashok
>> <oleksiy.stashok_at_oracle.com> wrote:
>>>
>>> Hi Dan,
>>>
>>> yeah, there is no silver bullet solution for all kind of usecases.
>>> An optimizedForMultiplexing is useful for concurrent writes, because the
>>> outbound messages are always added to the queue and written from the
>>> selector/nio thread, and at the write time Grizzly packs all (up to some
>>> limit) the available outbound messages and send them as one chunk, which
>>> reduces number of I/O operations. When optimizedForMultiplexing is disabled
>>> (by default) Grizzly (if the output queue is empty) first tries to send the
>>> outbound message right away in the same thread.
>>> So I'd say when optimizedForMultiplexing is disabled we potentially reduce
>>> latency, when optimizedForMultiplexing is enabled we increase throughput.
>>> But it's very simple way to look at this config parameter, I bet on practice
>>> you can experience opposite :))
>>>
>>> Thanks.
>>>
>>> WBR,
>>> Alexey.
>>>
>>>
>>> On 13.04.15 23:40, Daniel Feist wrote:
>>>
>>> Interestingly I saw a performance improvement using
>>> optimizedForMultiplexing with HTTP, although this potentially only affected
>>> my specific test scenario (simple low latency echo). Also note that this was
>>> when using worker threads, so not straight through using selectors.
>>>
>>> Let me turn off optimizedForMultiplexing, give inbound 1 selector per
>>> core, outbound 1 selector per core and see how this runs...
>>>
>>> Dan
>>>
>>> On Mon, Apr 13, 2015 at 11:44 PM, Oleksiy Stashok
>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>
>>>>
>>>>
>>>>>
>>>>> - Even if the same selector pool is configured for inbound and outbound,
>>>>> during response processing then Grizzly will still do a thread handover
>>>>> before sending response to client because of the use of AsyncQueueIO. This
>>>>> right?
>>>>>
>>>>> Not sure I understand this, IMO there won't be any extra thread handover
>>>>> involved.
>>>>
>>>>
>>>> I was referring to the AsyncWriteQueue. Currently I have
>>>> 'optimizedForMultiplexing' set true which I thought I'd seen previsouly
>>>> disabled the direct writing as you described further on in your email.
>>>> Pherhaps I should try without this flag though.
>>>>
>>>> Right, optimizedForMultiplexing is useful, when you concurrently write
>>>> packets to the connection, which is not the case with HTTP, unless it's HTTP
>>>> 2.0 :)
>>>>
>>>> Thanks.
>>>>
>>>> WBR,
>>>> Alexey.
>>>>
>>>>
>>>
>>>
>>
>>