users@jersey.java.net

[Jersey] Re: Async example misleading?

From: Kishore Senji <ksenji_at_gmail.com>
Date: Fri, 5 Sep 2014 11:16:55 -0700

Thank you Marek.

I agree to all your points and I did not say there is no advantage to
Async. I'm only referring to the example that users might think that
throughput would increase just by taking the processing to a different
thread. Throughput would only increase when that veryExpensiveOperation()
method is actually doing async IO. If it is only cpu bound, then yes the
container threads can take more requests (and they can also serve other
resource methods) but the requests to this resource method will still be
queued up and the worker threads are all busy working (spiking cpu) which
will impact the overall system performance. [Typically we have few methods
related to a domain deployed to a pool. For the client they can all be
under one end point, internally routed to the appropriate pool via ESB].
Even if the veryExpensiveOperation() is IO bound, the worker threads are
blocked waiting for the IO. This will queue up the tasks and the worker
threads cannot do any more work as they are blocked waiting for IO. This
pool of workers cannot be used for other resource methods (let us say they
also do async but have a different profile of relatively short cpu bound
tasks or quick IO) and they may have to be configured to use a different
thread pool etc.

In short, only when each and every operation in the call stack is async
(servlet needs to be async capable, then the database driver needs to
support async or the service call this service makes needs to be done on
async io) then only we can have throughput benefits (and support same
volume of traffic with less vms) otherwise having async at one layer
(Jersey/servlet) will not help when the actual database/service call is
blocking.

Thanks,
Kishore.


On Fri, Sep 5, 2014 at 9:30 AM, Marek Potociar <marek.potociar_at_oracle.com>
wrote:

>
> On 04 Sep 2014, at 21:27, Kishore Senji <ksenji_at_gmail.com> wrote:
>
> Hi All,
>
> The Async example is given at
> https://jersey.java.net/documentation/latest/async.html
>
> "However, in cases where a resource method execution is known to take a
> long time to compute the result, server-side asynchronous processing model
> should be used. In this model, the association between a request processing
> thread and client connection is broken. I/O container that handles incoming
> request may no longer assume that a client connection can be safely closed
> when a request processing thread returns. Instead a facility for explicitly
> suspending, resuming and closing client connections needs to be exposed.
> Note that the use of server-side asynchronous processing model will not
> improve the request processing time perceived by the client. *It will
> however increase the throughput of the server, by releasing the initial
> request processing thread back to the I/O container while the request may
> still be waiting in a queue for processing or the processing may still be
> running on another dedicated thread*. The released I/O container thread
> can be used to accept and process new incoming request connections."
>
> If veryExpensiveOperation() is expensive and is taking long time, then
> having it run in a different thread and releasing the request processing
> thread back to the I/O container, how would that improve the throughput?
>
>
> You are off-loading the I/O container threads, which are typically taken
> from a limited thread pool. If an I/O processing thread is blocked waiting,
> it cannot process new connections.
>
>
> If that is the case we can as well increase the number of request
> processing threads of the I/O container by the number of worker threads
> that we would use in the case of the example and not worry about Async at
> all.
>
>
> Please note that different resource methods may have different
> requirements. You typically want to configure your I/O thread pool size to
> match number of CPU cores (or sometimes CPU cores + c, where c is a
> constant < than number of cores). And then you want to make sure that only
> short computations are performed on these threads, so e.g. typically
> anything that may involve any I/O operation (disk, db, network) should
> better be coded as async, where thread context switch cost is offset by the
> overall operation cost (see also here
> <http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html>).
> Typically, also these operations tend to have specific execution
> characteristics, so a use of a dedicated thread pool with a separately
> tuned pool size is required to fine-tune the performance of the system.
>
> So advantage of using async API is that it gives you a much more
> fine-grained control over when the operation is delegated to a different
> thread pool as well as to which thread pool should the operation be
> delegated to, which is in contrast with your "one size fits all approach",
> which does nothing else then introduces the high probability of L1, L2 and
> L3 cache misses with every new request.
>
> We can take more and more connections and have them queue up (or would end
> up with creating many worker threads), but it would not necessarily
> increase throughput. It would increase throughput if the
> veryExpensiveOperation() is doing I/O over a Socket and if we use Async IO
> for that operation, then we can use minimal request threads and very small
> worker thread pool to do Async handling of the IO (or combine logic across
> multiple Service calls doing non-blocking IO, similar to Akka futures).
> This will improve the throughput as more work is done. But without
> non-blocking IO, if the veryExpensiveOperation() is either CPU bound or
> using blocking IO then the worker thread would infact be blocked for that
> time and we would end up with huge thread pool or a big queue of tasks
> waiting. Huge thread pool would not scale and big queue would also reduce
> the throughput.
>
>
> If you have an application, where the only service is the
> veryExpensiveOperation() resource method, then use of async is not likely
> to help. But frankly, how typical is that case? Often you have other
> services that would starve unnecessarily if you did not off-load the
> veryExpensiveOperation() to another thread pool.
>
>
> Nevertheless we definitely need a thread to take the processing to a
> different thread so that the container thread can be returned quickly. But
> is my understanding correct that it depends on what
> veryExpensiveOperation() does (blocking or non-blocking IO, or totally CPU
> bound computation etc) to actually improve the throughput?
>
>
> See above. I would say it does not depend on it. Obviously, in some cases
> (I/O) you would probably see better results than in others (CPU-intensive
> computation), and again it also depends on the overall context - other
> resources you need to serve, etc.
>
> Marek
>
> P.S. Interestingly, I've been just involved in a discussion, where the
> problem is that in some complex distributed systems you may start seeing
> cycles in the call graph. And if such system is implemented using
> synchronous APIs, a high system load can lead to thread pool exhaustion,
> which then leads to an inevitable system deadlock. This is another reason
> why esp. with any remote IO the use of async code is your best bet.
>
>
> Thanks,
> Kishore.
>
>
>