[jsr369-experts] Re: Race between error page dispatch and async servlet

From: Stuart Douglas <sdouglas_at_redhat.com>
Date: Thu, 6 Apr 2017 14:05:23 +1000

On Thu, Apr 6, 2017 at 1:37 PM, Greg Wilkins <gregw_at_webtide.com> wrote:
>
> We are trying to deal with a race condition in Jetty that we believe is kind
> of inherent in the spec rather than our handling of it. It is more
> apparent with HTTP/2 usage but I think that is can also happen with HTTP/1
> is very specific circumstances. I'm wondering of other containers have
> encountered this problem and perhaps if there is something we can do to help
> avoid it.
>
> The issue is that if the application is using async to generate a response
> from a non container thread, then there is an intrinsic race condition if an
> error occurs and an error page needs to be generated.
>
> On HTTP/1 and error page would only need to be generated if a prior
> dispatch/callback throws an exception. With HTTP/2 error pages dispatches
> can occur at any time because of a packet that has been received over the
> stream (which is continuously being read).

Unless I have misunderstood what you mean this is not possible in our
implementation. If a packet that results in an error (e.g. a
RST_STREAM) is read from the wire the application will not know about
it until it attempts an IO operation, where it will receive an
IOException it can handle however it wants. This is similar to the
situation in HTTP/1.1, if the remote peer has sent a RST packet then
the application will not know about it until it attempts IO.

I think the spec should prohibit the container from arbitrarily
calling sendError based on async IO events, as that way lies a
minefield of threading issues that will basically require all state
relating to the response to be thread safe with the associated
performance penalty.

Stuart

>
> The problem in more detail is that to generate an error page the container:
>
> checks if the response is committed - if so abort connection as best as
> possible (but may look like a normal EOF in HTTP/1.0)
> clears the response buffer
> sets the response status code/reason
> sets some response attributes to describe the error
> Either generates a page itself or does an DispatcherType.ERROR dispatch to
> the application to generate the error page.
>
> However in parallel the application may have async mechanisms that can
> mutate the response by:
>
> write to the response
> commit the response either by writing or by explicit flush
> set the response status code/reason
> call AsyncContext.complete() or other methods
>
> Now the handling for the last of these is well defined, because we can have
> an exact before/after relationship between the call to complete and the
> decision to do an error dispatch.
>
> However, any of the other mutating methods may happen concurrently with the
> container/application generating an error response. So all sorts of nasty
> outcomes are possible: async write content in the middle of the error page,
> async status code sent instead of error page; application response written
> with error status but normal content, illegal state exception when resetting
> buffers because response committed asynchronously etc. etc. The exact form
> of the disaster is container dependent as the order things are done for the
> error page is not defined.
>
> Now currently this is not too much of a huge issue, as the kinds of errors
> that we may get are typically fatal to the connection/stream (eg receiving
> a HTTP/2 stream reset), so an error page is unlikely to be successfully
> generated, but we still need to make the effort to inform the application
> that an error has occurred even if a page cannot be generated. Plus also in
> future there may be other error conditions from a HTTP/2 connection
> violating some server imposed constraints that do require a error page to be
> generated.
>
> We could try to handle this with some kind of mutual exclusion on the
> response methods, but generating an error page is a multi operation process
> that can call the application, so holding a lock while calling application
> code is just asking for problems.
>
> A better approach for Servlet 4.0 may be to discourage arbitrary threads
> from being able to call the response API. Instead we could have a variant
> of AsyncContext.start(Runnable), that guarantees that the Runnable will not
> be run concurrently with any container dispatched thread on the associated
> transaction. We already give that guarantee for container callback and
> dispatches, but it is impossible for an async thread to enforce non
> concurrency with any container action and/or container managed callbacks.
>
> So lets call the method AsyncContext.run(Runnable) and if called when no
> container managed thread is active, then the Runnable would be called
> immediately by the calling thread (in context scope like start), otherwise
> the Runnable would be queued to be called by a container managed thread
> immediately after the container dispatch is complete and no other container
> dispatch would occur until after the Runnable returns.
>
> Thoughts
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Greg Wilkins <gregw@webtide.com> CTO http://webtide.com