Re: Optimal IOStrategy/ThreadPool configuration for proxy

From: Oleksiy Stashok <oleksiy.stashok_at_oracle.com>
Date: Thu, 16 Apr 2015 23:00:05 -0700

Dan, can you also share the proxy code, or at least part of it, that
show how you receive client request and proxy it to a back end server?

Thanks.

WBR,
Alexey.

On 16.04.15 17:24, Daniel Feist wrote:
> Ignore my last email about this affecting low concurrency, it doesn't.
> I was only seeing some errors at low concurrency due to side-effects
> of previous test run I think. I need 2000+ JMeter client threads to
> reproduce this consistently.
>
> I stripped out everything as much as possible so i'm not doing
> anything in between and AHC is invoking inbound grizzly response as
> directly as possible but no difference. The exact error in jmeter is
> "java.net.SocketTimeoutException,Non HTTP response message: Read timed
> out".
>
> Question: this might sound stupid, but couldn't it simply be that the
> proxy, with the number of selectors it has (and not using worker
> threads) simply cannot handle the load? And that we don't see errors
> with blocking because back-pressured is applied more directly whereas
> with non-blocking the same type of back-pressure doesn't occur and so
> we get this type of error instead?
>
> Dan
>
> On Thu, Apr 16, 2015 at 10:16 PM, Daniel Feist <dfeist_at_gmail.com> wrote:
>> The thing is, if i remove the outbound call then is ceases to be a
>> proxy and as such I don't have a seperate thread processing the
>> response callback and instead it behaves blocking (which works)
>>
>> Anyway, I'll try to simplify as much as possible in other ways and see
>> where that leads me...
>>
>> Dan
>>
>> On Thu, Apr 16, 2015 at 9:00 PM, Oleksiy Stashok
>> <oleksiy.stashok_at_oracle.com> wrote:
>>> Hi Dan,
>>>
>>> let's try to simplify the test, what happens if the proxy sends the response
>>> right away (no outbound calls), do you still see the timeouts?
>>>
>>> Thanks.
>>>
>>> WBR,
>>> Alexey.
>>>
>>>
>>> On 16.04.15 12:17, Daniel Feist wrote:
>>>> What I forgot to add is that I see the same issue with timeouts
>>>> between jmeter and the proxy even when "jmeter threads < selectors",
>>>> which kind of invalidates all of my ideas about selectors all
>>>> potentially being busy..
>>>>
>>>> Wow, even with 1 thread it's occuring.. most be something stupid... I
>>>> don't think it's releated to persistent connections, maxKeepAlive on
>>>> target service is 100, which wouldn't explain rougly 1 in 2000
>>>> client-side timeout, especially given no errors are being logged.
>>>>
>>>>
>>>>
>>>> On Thu, Apr 16, 2015 at 7:36 PM, Daniel Feist <dfeist_at_gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Nothing different really, just the blocking version returns response
>>>>> when the stack returns after waiting on outbound future returned by
>>>>> AHC, while the non-blocking version returns response when the
>>>>> completion handler passed to AHC is invoked. Ah, also blocking version
>>>>> using WorkerThreadIOStrategy while the non-blocking version uses
>>>>> SameThreadIOStrategy for inbound.
>>>>>
>>>>> I didn't reply earlier because I've been trying to get my head round
>>>>> whats going on. The errors are all timeout errors. Most of the
>>>>> timeout errors between jmeter and the proxy, but also some timeout
>>>>> errors between the proxy and target service, whereas with the blocking
>>>>> version there are no errors at all.
>>>>>
>>>>> Everything seems to be ok, and there are no exceptions being throw
>>>>> (other than timeouts) by grizzly/ahc. So my only hypothesis is that
>>>>> there is an issue with the selectors, either:
>>>>>
>>>>> i) for some reason selectors are blocking. (i see no evidence of this
>>>>> though, only thing i have between inbound and outbound is some copying
>>>>> of headers)
>>>>> ii) different number of inbound/outbound selectors could generate more
>>>>> inbound message than can be handled by outbound (i've ensured both
>>>>> have same number of selectors, and doesn't help, giving outbound more
>>>>> selectors than inbound seemed to improve things, but not solve the
>>>>> problem). BTW thought is what provoked my original email about shared
>>>>> transports/selectors.
>>>>> iii) By using dedicatedAcceptor the proxy is accepting all connection
>>>>> attempts immedialty, but a selector doesn't manage to handle read
>>>>> event before timeout is reached. (although changing this back to false
>>>>> didn't seem to help).
>>>>>
>>>>> I was initially testing with 4000 client threads, hitting proxy on
>>>>> 24-core machine which in turn hits an simple service with 5ms latency
>>>>> on another 24-core machine. But if I run with just 200 client threads
>>>>> I'm seeing the same :-(
>>>>>
>>>>> Last run i just did with concurrency of 200 gave 1159 errors, (6
>>>>> outbound timeouts and 1152 jmeter timeouts) in total of 4,154,978
>>>>> requests. It's only 0.03% but lot more than blocking, and no reason
>>>>> they should be happening.
>>>>>
>>>>> Any hints on where to look next would be greatly appreciated...
>>>>>
>>>>> thanks!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 15, 2015 at 2:16 AM, Oleksiy Stashok
>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>> What's the implementation diff of blocking vs. non-blocking? I mean is
>>>>>> there
>>>>>> any change in your code?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> WBR,
>>>>>> Alexey.
>>>>>>
>>>>>>
>>>>>> On 14.04.15 18:01, Daniel Feist wrote:
>>>>>>
>>>>>> Very interesting. My previuos tests had been with a simple inbound
>>>>>> echo.
>>>>>> When testing with a non-blocking proxy (1Kb payload, 5ms target service
>>>>>> latency) optimizedForMultiplexing=false appears to give better TPS and
>>>>>> latency :-)
>>>>>>
>>>>>> Having some issues with non-blocking proxy in general though, getting
>>>>>> decent
>>>>>> number of errors whereas in blocking mode get zero. Is it possible that
>>>>>> stale connections aren't handled in the same way, or is there something
>>>>>> else
>>>>>> that might be causing this? I'll do some more digging around, but what
>>>>>> I'm
>>>>>> seeing right now is 0.05% of jmeter client requests timing out after
>>>>>> 60s.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 14, 2015 at 9:25 PM, Oleksiy Stashok
>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>> Hi Dan,
>>>>>>>
>>>>>>> yeah, there is no silver bullet solution for all kind of usecases.
>>>>>>> An optimizedForMultiplexing is useful for concurrent writes, because
>>>>>>> the
>>>>>>> outbound messages are always added to the queue and written from the
>>>>>>> selector/nio thread, and at the write time Grizzly packs all (up to
>>>>>>> some
>>>>>>> limit) the available outbound messages and send them as one chunk,
>>>>>>> which
>>>>>>> reduces number of I/O operations. When optimizedForMultiplexing is
>>>>>>> disabled
>>>>>>> (by default) Grizzly (if the output queue is empty) first tries to send
>>>>>>> the
>>>>>>> outbound message right away in the same thread.
>>>>>>> So I'd say when optimizedForMultiplexing is disabled we potentially
>>>>>>> reduce
>>>>>>> latency, when optimizedForMultiplexing is enabled we increase
>>>>>>> throughput.
>>>>>>> But it's very simple way to look at this config parameter, I bet on
>>>>>>> practice
>>>>>>> you can experience opposite :))
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> WBR,
>>>>>>> Alexey.
>>>>>>>
>>>>>>>
>>>>>>> On 13.04.15 23:40, Daniel Feist wrote:
>>>>>>>
>>>>>>> Interestingly I saw a performance improvement using
>>>>>>> optimizedForMultiplexing with HTTP, although this potentially only
>>>>>>> affected
>>>>>>> my specific test scenario (simple low latency echo). Also note that
>>>>>>> this was
>>>>>>> when using worker threads, so not straight through using selectors.
>>>>>>>
>>>>>>> Let me turn off optimizedForMultiplexing, give inbound 1 selector per
>>>>>>> core, outbound 1 selector per core and see how this runs...
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> On Mon, Apr 13, 2015 at 11:44 PM, Oleksiy Stashok
>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> - Even if the same selector pool is configured for inbound and
>>>>>>>>> outbound,
>>>>>>>>> during response processing then Grizzly will still do a thread
>>>>>>>>> handover
>>>>>>>>> before sending response to client because of the use of AsyncQueueIO.
>>>>>>>>> This
>>>>>>>>> right?
>>>>>>>>>
>>>>>>>>> Not sure I understand this, IMO there won't be any extra thread
>>>>>>>>> handover
>>>>>>>>> involved.
>>>>>>>>
>>>>>>>> I was referring to the AsyncWriteQueue. Currently I have
>>>>>>>> 'optimizedForMultiplexing' set true which I thought I'd seen
>>>>>>>> previsouly
>>>>>>>> disabled the direct writing as you described further on in your email.
>>>>>>>> Pherhaps I should try without this flag though.
>>>>>>>>
>>>>>>>> Right, optimizedForMultiplexing is useful, when you concurrently write
>>>>>>>> packets to the connection, which is not the case with HTTP, unless
>>>>>>>> it's HTTP
>>>>>>>> 2.0 :)
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> WBR,
>>>>>>>> Alexey.
>>>>>>>>
>>>>>>>>