[jsr369-experts] Race between error page dispatch and async servlet

From: Greg Wilkins <gregw_at_webtide.com>
Date: Thu, 6 Apr 2017 13:37:09 +1000

We are trying to deal with a race condition in Jetty that we believe is
kind of inherent in the spec rather than our handling of it. It is more
apparent with HTTP/2 usage but I think that is can also happen with HTTP/1
is very specific circumstances. I'm wondering of other containers have
encountered this problem and perhaps if there is something we can do to
help avoid it.

The issue is that if the application is using async to generate a response
from a non container thread, then there is an intrinsic race condition if
an error occurs and an error page needs to be generated.

On HTTP/1 and error page would only need to be generated if a prior
dispatch/callback throws an exception. With HTTP/2 error pages dispatches
can occur at any time because of a packet that has been received over the
stream (which is continuously being read).

The problem in more detail is that to generate an error page the container:

   - checks if the response is committed - if so abort connection as best
   as possible (but may look like a normal EOF in HTTP/1.0)
   - clears the response buffer
   - sets the response status code/reason
   - sets some response attributes to describe the error
   - Either generates a page itself or does an DispatcherType.ERROR
   dispatch to the application to generate the error page.

However in parallel the application may have async mechanisms that can
mutate the response by:

   - write to the response
   - commit the response either by writing or by explicit flush
   - set the response status code/reason
   - call AsyncContext.complete() or other methods

Now the handling for the last of these is well defined, because we can have
an exact before/after relationship between the call to complete and the
decision to do an error dispatch.

However, any of the other mutating methods may happen concurrently with the
container/application generating an error response. So all sorts of nasty
outcomes are possible: async write content in the middle of the error page,
async status code sent instead of error page; application response written
with error status but normal content, illegal state exception when
resetting buffers because response committed asynchronously etc. etc. The
exact form of the disaster is container dependent as the order things are
done for the error page is not defined.

Now currently this is not too much of a huge issue, as the kinds of errors
that we may get are typically fatal to the connection/stream (eg receiving
a HTTP/2 stream reset), so an error page is unlikely to be successfully
generated, but we still need to make the effort to inform the application
that an error has occurred even if a page cannot be generated. Plus also
in future there may be other error conditions from a HTTP/2 connection
violating some server imposed constraints that do require a error page to
be generated.

We could try to handle this with some kind of mutual exclusion on the
response methods, but generating an error page is a multi operation process
that can call the application, so holding a lock while calling application
code is just asking for problems.

A better approach for Servlet 4.0 may be to discourage arbitrary threads
from being able to call the response API. Instead we could have a variant
of AsyncContext.start(Runnable), that guarantees that the Runnable will not
be run concurrently with any container dispatched thread on the associated
transaction. We already give that guarantee for container callback and
dispatches, but it is impossible for an async thread to enforce non
concurrency with any container action and/or container managed callbacks.

So lets call the method AsyncContext.run(Runnable) and if called when no
container managed thread is active, then the Runnable would be called
immediately by the calling thread (in context scope like start), otherwise
the Runnable would be queued to be called by a container managed thread
immediately after the container dispatch is complete and no other container
dispatch would occur until after the Runnable returns.

Thoughts

-- 
Greg Wilkins <gregw@webtide.com> CTO http://webtide.com