[Jersey] What is the proper way of cleaning up asynchronous requests before servlet redeploy (Jersey, Tomcat)?

From: Bela Bujk <tersyxus_at_gmail.com>
Date: Mon, 27 Jun 2016 13:45:30 +0200

Hi,

I've submitted a Jersey-related question
<http://stackoverflow.com/questions/37934558/what-is-the-proper-way-of-cleaning-up-asynchronous-requests-before-servlet-redep>
on Stackoverflow. I'm posting it here hoping it will get answered by some
of the experts guys on this mailing list. :)

Will appreciate any hint on this.

---
I have an asynchronous *JAX-RS* API for long-polling clients put together
in *Jersey Container Servlet 2.22* and hosted on *Tomcat 7*.
It looks similar to the snippet shown below. It works well in production.
On average 150 long-polling requests are being executed at the same time.
It results in almost the*same number of live Tomcat HTTP connections*
(according
to JMX metrics). For this low traffic scenario plain-old *HTTP-BIO* connector
has been used without problems. No runtime connection leak can be detected
provided you use only managed threads :)
@POST_at_Path("/liveEvents")@ManagedAsyncpublic void getResult(@Suspended
final AsyncResponse asyncResponse, RequestPayload payload) {
    asyncResponse.setTimeout(longPollTimeoutMs, TimeUnit.MILLISECONDS);
    asyncResponse.setTimeoutHandler(new TimeoutHandler() {
        @Override
        public void handleTimeout(AsyncResponse asyncResponseArg) {
            try {
                asyncResponseArg.cancel();
            } finally {
                cleanupResources();
            }
        }
    });
    startListeningForExternalEventsAndReturn(payload);}
private void startListeningForExternalEventsAndReturn(RequestPayload payload) {
    externalResource.register(new Listener() {
        @Override
        public void onEvent(Event event) {
            respond(event);
        }
    });}
private void respond(Event event) {
    try {
        asyncResponse.resume(event);
    } catch (RuntimeException exception) {
        asyncResponse.resume(exception);
    } finally {
        cleanupResources();
    }}
The problem I'm facing is that after a successful *Tomcat* redeploy process
the number of live connections will apparently increase to about 300 then
to 450 and after some further redeploys it will hit the maxConnection limit
configured for the container.
The clients of the API handle the redeploy by waiting for a client-side
timeout (which is of course bigger than the one set at servlet-side) and
start polling the API again. But they are guaranteed to send only one
request at the same time.
The shape of the monitoring graph on connection count gives a hint.
Connection count remains constant after undeployment (connections are not
released back to the pool even by TimeOutHandler) and starts to increase
(allocate new connections) as clients start long-polling again. *In fact,
ongoing (suspended) asnyc requests started in the previous context are
never relesed until JVM termination!*
After some digging around it's not difficult to find out by analyzing heap
dumps made after few redeployments that unreleased, suspended AsyncResponse
 (AsyncResponder) instances remain in the memory from previous web
application contexts (easily filterable by JQL queries grouped by
Classloader instances). It's very suspicious too that the same number of
out-dated org.apache.coyote.Request instances are present in the memory
from previous contexts.
I started to look around the undeployment-related source code of the *Jersey
Container* hoping that some graceful shutdown process is implemented for
async requests with some cleanup actions executed at @PreDestroy-time or in
close() or dispose() methods of Providers.
I had an optimistic guess that by running each scheduled TimeOutHandlers
right before undeployment would solve the problem. But replacing the
default @BackgroundScheduler provider (DefaultBackgroundSchedulerProvider)
to a custom implementation and collecting all queued TimeoutHandlers of the
Executor and eventually invoking AsyncResponse.resume() or
AsyncResponse.cancel() on them did not help. This stage might be too late
for this cleaning up because request-scope is already shut down.
Any ideas on what my async setup is missing or how *Jersey* can be
configured to release the *Servlet Container*'s connections that are still
suspended at redeploy-time?