users@glassfish.java.net

Re: HTTP-service thread pool exhaustion / web service invocation freeze

From: <glassfish_at_javadesktop.org>
Date: Mon, 08 Sep 2008 09:47:51 PDT

Jeanfrancois,

Thanks for the quick response, and the confirmation of my theory.

My test was a bit contrived, but it was designed to easily recreate a much more realistic scenario that we encountered using GF for the PoC I mentioned. In a particular use case an external client synchronously called a particular service (call it A) hosted by the GF instance. In the course of servicing that request, the service implementation for A synchronously called another service B, which was hosted in that same GF instance. Service B synchronously called service C, etc. There wasn't any type of recursion, it was a synchronous invocation "chain" that reached (or attempted to reach) a depth of 5, client -> A -> B -> C -> D-> E -> F, when it broke. (The client invocation of A obviously ties up one other thread.)

It was the combination of fairly nested service composition (a natural result of building more granular services out of finer-grained ones) and the fact that the services were all hosted on the same GF instance that caused the problem. This is a normal usage pattern (I agree that recursive service invocation is not realistic!).

When the problem is encountered, all threads on the invocation "chain" are frozen, until an eventual time-out. This impacts system availability, since no other external (or internal) service invocations can be started.

Expanding the thread pool size addresses the case of a single client accessing the service, but the problem returns when multiple clients are concurrently accessing the service. A sufficiently large thread pool would make the probability of exhausting the thread pool lower, but there is no way reduce it to zero. This makes mounting a fairly simple DoS attack quite easy. It also affects scalability, since the number of on-going service invocations is limited by the thread pool size, and threads are an inherently expensive resource. So I'm sure you can understand my concern.

I'm sure Jetty and Tomcat share this same vulnerability, but I was surprised that Grizzly introduces the same problem. It sounds like there is an opportunity to improve performance within GF by detecting the self-hosted service invocation scenario, and avoiding some of the lower-level parts of the protocol stack, as well as avoiding the need for a new thread. Just a suggestion...

By the way, the Open ESB normalized message router (NMR) employs an interested pattern that breaks synchronous service invocation MEPs into asynchronously handled messages (with MEP state associated with them). This is a much more scalable approach (the number of threads is constant, not a function of the number of concurrent service invocations), and might be worth a look as a pattern for improving message handling within GF / Grizzly. Just another thought...

Best regards,
--Ron
[Message sent by forum member 'rtenhove' (rtenhove)]

http://forums.java.net/jive/thread.jspa?messageID=297853