Re: Optimal IOStrategy/ThreadPool configuration for proxy

From: Daniel Feist <dfeist_at_gmail.com>
Date: Thu, 16 Apr 2015 19:36:30 +0100

Hi,

Nothing different really, just the blocking version returns response
when the stack returns after waiting on outbound future returned by
AHC, while the non-blocking version returns response when the
completion handler passed to AHC is invoked. Ah, also blocking version
using WorkerThreadIOStrategy while the non-blocking version uses
SameThreadIOStrategy for inbound.

I didn't reply earlier because I've been trying to get my head round
whats going on. The errors are all timeout errors. Most of the
timeout errors between jmeter and the proxy, but also some timeout
errors between the proxy and target service, whereas with the blocking
version there are no errors at all.

Everything seems to be ok, and there are no exceptions being throw
(other than timeouts) by grizzly/ahc. So my only hypothesis is that
there is an issue with the selectors, either:

i) for some reason selectors are blocking. (i see no evidence of this
though, only thing i have between inbound and outbound is some copying
of headers)
ii) different number of inbound/outbound selectors could generate more
inbound message than can be handled by outbound (i've ensured both
have same number of selectors, and doesn't help, giving outbound more
selectors than inbound seemed to improve things, but not solve the
problem). BTW thought is what provoked my original email about shared
transports/selectors.
iii) By using dedicatedAcceptor the proxy is accepting all connection
attempts immedialty, but a selector doesn't manage to handle read
event before timeout is reached. (although changing this back to false
didn't seem to help).

I was initially testing with 4000 client threads, hitting proxy on
24-core machine which in turn hits an simple service with 5ms latency
on another 24-core machine. But if I run with just 200 client threads
I'm seeing the same :-(

Last run i just did with concurrency of 200 gave 1159 errors, (6
outbound timeouts and 1152 jmeter timeouts) in total of 4,154,978
requests. It's only 0.03% but lot more than blocking, and no reason
they should be happening.

Any hints on where to look next would be greatly appreciated...

thanks!

On Wed, Apr 15, 2015 at 2:16 AM, Oleksiy Stashok
<oleksiy.stashok_at_oracle.com> wrote:
> What's the implementation diff of blocking vs. non-blocking? I mean is there
> any change in your code?
>
> Thanks.
>
> WBR,
> Alexey.
>
>
> On 14.04.15 18:01, Daniel Feist wrote:
>
> Very interesting. My previuos tests had been with a simple inbound echo.
> When testing with a non-blocking proxy (1Kb payload, 5ms target service
> latency) optimizedForMultiplexing=false appears to give better TPS and
> latency :-)
>
> Having some issues with non-blocking proxy in general though, getting decent
> number of errors whereas in blocking mode get zero. Is it possible that
> stale connections aren't handled in the same way, or is there something else
> that might be causing this? I'll do some more digging around, but what I'm
> seeing right now is 0.05% of jmeter client requests timing out after 60s.
>
> Dan
>
>
>
> On Tue, Apr 14, 2015 at 9:25 PM, Oleksiy Stashok
> <oleksiy.stashok_at_oracle.com> wrote:
>>
>> Hi Dan,
>>
>> yeah, there is no silver bullet solution for all kind of usecases.
>> An optimizedForMultiplexing is useful for concurrent writes, because the
>> outbound messages are always added to the queue and written from the
>> selector/nio thread, and at the write time Grizzly packs all (up to some
>> limit) the available outbound messages and send them as one chunk, which
>> reduces number of I/O operations. When optimizedForMultiplexing is disabled
>> (by default) Grizzly (if the output queue is empty) first tries to send the
>> outbound message right away in the same thread.
>> So I'd say when optimizedForMultiplexing is disabled we potentially reduce
>> latency, when optimizedForMultiplexing is enabled we increase throughput.
>> But it's very simple way to look at this config parameter, I bet on practice
>> you can experience opposite :))
>>
>> Thanks.
>>
>> WBR,
>> Alexey.
>>
>>
>> On 13.04.15 23:40, Daniel Feist wrote:
>>
>> Interestingly I saw a performance improvement using
>> optimizedForMultiplexing with HTTP, although this potentially only affected
>> my specific test scenario (simple low latency echo). Also note that this was
>> when using worker threads, so not straight through using selectors.
>>
>> Let me turn off optimizedForMultiplexing, give inbound 1 selector per
>> core, outbound 1 selector per core and see how this runs...
>>
>> Dan
>>
>> On Mon, Apr 13, 2015 at 11:44 PM, Oleksiy Stashok
>> <oleksiy.stashok_at_oracle.com> wrote:
>>>
>>>
>>>
>>>>
>>>> - Even if the same selector pool is configured for inbound and outbound,
>>>> during response processing then Grizzly will still do a thread handover
>>>> before sending response to client because of the use of AsyncQueueIO. This
>>>> right?
>>>>
>>>> Not sure I understand this, IMO there won't be any extra thread handover
>>>> involved.
>>>
>>>
>>> I was referring to the AsyncWriteQueue. Currently I have
>>> 'optimizedForMultiplexing' set true which I thought I'd seen previsouly
>>> disabled the direct writing as you described further on in your email.
>>> Pherhaps I should try without this flag though.
>>>
>>> Right, optimizedForMultiplexing is useful, when you concurrently write
>>> packets to the connection, which is not the case with HTTP, unless it's HTTP
>>> 2.0 :)
>>>
>>> Thanks.
>>>
>>> WBR,
>>> Alexey.
>>>
>>>
>>
>>
>
>