users@jersey.java.net

[Jersey] Re: Async example misleading?

From: Adam Zell <zellster_at_gmail.com>
Date: Fri, 5 Sep 2014 11:45:10 -0700

In case someone is interested in async operations but at a higher
abstraction level, there is https://github.com/puniverse/comsat.
 Unfortunately for me JDBC operations are still limited to a thread pool.
 I guess there is little interest in a non-blocking JDBC API.


On Fri, Sep 5, 2014 at 11:16 AM, Kishore Senji <ksenji_at_gmail.com> wrote:

> Thank you Marek.
>
> I agree to all your points and I did not say there is no advantage to
> Async. I'm only referring to the example that users might think that
> throughput would increase just by taking the processing to a different
> thread. Throughput would only increase when that veryExpensiveOperation()
> method is actually doing async IO. If it is only cpu bound, then yes the
> container threads can take more requests (and they can also serve other
> resource methods) but the requests to this resource method will still be
> queued up and the worker threads are all busy working (spiking cpu) which
> will impact the overall system performance. [Typically we have few methods
> related to a domain deployed to a pool. For the client they can all be
> under one end point, internally routed to the appropriate pool via ESB].
> Even if the veryExpensiveOperation() is IO bound, the worker threads are
> blocked waiting for the IO. This will queue up the tasks and the worker
> threads cannot do any more work as they are blocked waiting for IO. This
> pool of workers cannot be used for other resource methods (let us say they
> also do async but have a different profile of relatively short cpu bound
> tasks or quick IO) and they may have to be configured to use a different
> thread pool etc.
>
> In short, only when each and every operation in the call stack is async
> (servlet needs to be async capable, then the database driver needs to
> support async or the service call this service makes needs to be done on
> async io) then only we can have throughput benefits (and support same
> volume of traffic with less vms) otherwise having async at one layer
> (Jersey/servlet) will not help when the actual database/service call is
> blocking.
>
> Thanks,
> Kishore.
>
>
> On Fri, Sep 5, 2014 at 9:30 AM, Marek Potociar <marek.potociar_at_oracle.com>
> wrote:
>
>>
>> On 04 Sep 2014, at 21:27, Kishore Senji <ksenji_at_gmail.com> wrote:
>>
>> Hi All,
>>
>> The Async example is given at
>> https://jersey.java.net/documentation/latest/async.html
>>
>> "However, in cases where a resource method execution is known to take a
>> long time to compute the result, server-side asynchronous processing model
>> should be used. In this model, the association between a request processing
>> thread and client connection is broken. I/O container that handles incoming
>> request may no longer assume that a client connection can be safely closed
>> when a request processing thread returns. Instead a facility for explicitly
>> suspending, resuming and closing client connections needs to be exposed.
>> Note that the use of server-side asynchronous processing model will not
>> improve the request processing time perceived by the client. *It will
>> however increase the throughput of the server, by releasing the initial
>> request processing thread back to the I/O container while the request may
>> still be waiting in a queue for processing or the processing may still be
>> running on another dedicated thread*. The released I/O container thread
>> can be used to accept and process new incoming request connections."
>>
>> If veryExpensiveOperation() is expensive and is taking long time, then
>> having it run in a different thread and releasing the request processing
>> thread back to the I/O container, how would that improve the throughput?
>>
>>
>> You are off-loading the I/O container threads, which are typically taken
>> from a limited thread pool. If an I/O processing thread is blocked waiting,
>> it cannot process new connections.
>>
>>
>> If that is the case we can as well increase the number of request
>> processing threads of the I/O container by the number of worker threads
>> that we would use in the case of the example and not worry about Async at
>> all.
>>
>>
>> Please note that different resource methods may have different
>> requirements. You typically want to configure your I/O thread pool size to
>> match number of CPU cores (or sometimes CPU cores + c, where c is a
>> constant < than number of cores). And then you want to make sure that only
>> short computations are performed on these threads, so e.g. typically
>> anything that may involve any I/O operation (disk, db, network) should
>> better be coded as async, where thread context switch cost is offset by the
>> overall operation cost (see also here
>> <http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html>).
>> Typically, also these operations tend to have specific execution
>> characteristics, so a use of a dedicated thread pool with a separately
>> tuned pool size is required to fine-tune the performance of the system.
>>
>> So advantage of using async API is that it gives you a much more
>> fine-grained control over when the operation is delegated to a different
>> thread pool as well as to which thread pool should the operation be
>> delegated to, which is in contrast with your "one size fits all approach",
>> which does nothing else then introduces the high probability of L1, L2 and
>> L3 cache misses with every new request.
>>
>> We can take more and more connections and have them queue up (or would
>> end up with creating many worker threads), but it would not necessarily
>> increase throughput. It would increase throughput if the
>> veryExpensiveOperation() is doing I/O over a Socket and if we use Async IO
>> for that operation, then we can use minimal request threads and very small
>> worker thread pool to do Async handling of the IO (or combine logic across
>> multiple Service calls doing non-blocking IO, similar to Akka futures).
>> This will improve the throughput as more work is done. But without
>> non-blocking IO, if the veryExpensiveOperation() is either CPU bound or
>> using blocking IO then the worker thread would infact be blocked for that
>> time and we would end up with huge thread pool or a big queue of tasks
>> waiting. Huge thread pool would not scale and big queue would also reduce
>> the throughput.
>>
>>
>> If you have an application, where the only service is the
>> veryExpensiveOperation() resource method, then use of async is not likely
>> to help. But frankly, how typical is that case? Often you have other
>> services that would starve unnecessarily if you did not off-load the
>> veryExpensiveOperation() to another thread pool.
>>
>>
>> Nevertheless we definitely need a thread to take the processing to a
>> different thread so that the container thread can be returned quickly. But
>> is my understanding correct that it depends on what
>> veryExpensiveOperation() does (blocking or non-blocking IO, or totally CPU
>> bound computation etc) to actually improve the throughput?
>>
>>
>> See above. I would say it does not depend on it. Obviously, in some cases
>> (I/O) you would probably see better results than in others (CPU-intensive
>> computation), and again it also depends on the overall context - other
>> resources you need to serve, etc.
>>
>> Marek
>>
>> P.S. Interestingly, I've been just involved in a discussion, where the
>> problem is that in some complex distributed systems you may start seeing
>> cycles in the call graph. And if such system is implemented using
>> synchronous APIs, a high system load can lead to thread pool exhaustion,
>> which then leads to an inevitable system deadlock. This is another reason
>> why esp. with any remote IO the use of async code is your best bet.
>>
>>
>> Thanks,
>> Kishore.
>>
>>
>>
>


-- 
Adam
zellster_at_gmail.com