[jsr369-experts] Re: [servlet-spec users] Re: Re: Race between error page dispatch and async servlet

From: Greg Wilkins <gregw_at_webtide.com>
Date: Sat, 22 Apr 2017 10:53:33 +1000

All,

I know this is very late in the process to bring up this one again.....
but while the particular race that I discussed is solvable there are still
a large range of very difficult races that can occur in fully async
applications.

The container provides mutual exclusion for all threads calling filters,
servlets, AsyncListeners, ReadListener and WriteListeners. There can only
ever be one thread dispatched to one of these handlers and thus a single
order can be known and handled.

However, there is no such requirement on applications threads that call
AsyncContext.start(Runnable). There can be multiple of those active at
once and concurrently with container managed threads. This puts a huge
burden on async applications to provide mutual exclusion and can often
result in too much locking and blocking (which is not what you want for
async apps).

Thus I still think there is great value in a variant
of AsyncContext.start(Runnable) that will run the task in a container
managed thread that is mutually excluded from all container dispatches
associated with the request.

Writing asynchronous applications is vastly simpler if you can know there
is only ever 1 thread active for a given request. So I'd like to propose
that we add to AsyncContext:

    /**
     * Causes the container to dispatch a thread, possibly from a managed
     * thread pool, to run the specified <tt>Runnable</tt>. The container
may
     * propagate appropriate contextual information to the
<tt>Runnable</tt>.
     * The container will guarantee that the thread will not be run
concurrently
     * with any other thread dispatched by the container for the associated
     * request for any invocation of the standard API including other
invocations
     * of this method.
     *
     * @param run the asynchronous handler, which is assumed to be a short
lived,
     * preferably non blocking, method.
     */
    public void startSingleThreaded(Runnable run);

OK the name sucks

On 6 April 2017 at 15:43, Stuart Douglas <sdouglas_at_redhat.com> wrote:

> On Thu, Apr 6, 2017 at 3:14 PM, Greg Wilkins <gregw_at_webtide.com> wrote:
> > Stuart,
> >
> > Ah good idea to add such an IO event to the input queue so it occurs
> once an
> > IO operation is attempted. This would indeed help with the most
> immediate
> > manifestation of the problem we have.
> >
> > However, many of our http/2 users are interested in fail fast semantics
> that
> > are available with HTTP/2 and we have discussed here how such errors can
> be
> > delivered.... perhaps the can be delivered to any AsyncListener onError
> if
> > registered and then in any subsequent IO operations.
>
> I think this may need to be a new listener, as this is a new semantic
> with regards to thread safety that may cause problems.
>
> In particular at the moment onError is only called after a dispatch()
> call, so the Request/Response objects are basically 'owned' by the
> thread doing the dispatch, and the onError call will be called by that
> thread (there could still be other user threads active, however then
> it is the users responsibility to synchronize access to the
> Request/Response objects).
>
> Invoking the onError method in a different thread due to connection
> failure could break existing listeners that are not coded in a thread
> safe manner.
>
> In addition to this every mechanism by which we can get a 'fail fast'
> result implies that the underlying connection is broken and no
> response can be sent. As a result it is pointless to generate an error
> page, which I think most existing onError implementation will be
> trying to do.
>
> If users want fail fast support I think we need to explicitly add
> support for it, otherwise failures should be reported when the user
> attempts IO as normal (with the potential for container specific
> support for fail fast).
>
> Stuart
>
> >
> > But note that even if we so handle such IO events, I think it is still
> > possible for the problem to occur when handling an exception thrown from
> a
> > dispatch to a servlet. The container still needs to generate an error
> page,
> > but the application may have already commenced an async write mechanism
> that
> > can race with the error page.
>
> In this case I do not start the async write process until the
> dispatch() call stack has returned, which side steps this problem.
>
> Stuart
>
> >
> > cheers
> >
> >
> >
> >
> > On 6 April 2017 at 14:05, Stuart Douglas <sdouglas_at_redhat.com> wrote:
> >>
> >> On Thu, Apr 6, 2017 at 1:37 PM, Greg Wilkins <gregw_at_webtide.com> wrote:
> >> >
> >> > We are trying to deal with a race condition in Jetty that we believe
> is
> >> > kind
> >> > of inherent in the spec rather than our handling of it. It is more
> >> > apparent with HTTP/2 usage but I think that is can also happen with
> >> > HTTP/1
> >> > is very specific circumstances. I'm wondering of other containers
> have
> >> > encountered this problem and perhaps if there is something we can do
> to
> >> > help
> >> > avoid it.
> >> >
> >> > The issue is that if the application is using async to generate a
> >> > response
> >> > from a non container thread, then there is an intrinsic race condition
> >> > if an
> >> > error occurs and an error page needs to be generated.
> >> >
> >> > On HTTP/1 and error page would only need to be generated if a prior
> >> > dispatch/callback throws an exception. With HTTP/2 error pages
> >> > dispatches
> >> > can occur at any time because of a packet that has been received over
> >> > the
> >> > stream (which is continuously being read).
> >>
> >> Unless I have misunderstood what you mean this is not possible in our
> >> implementation. If a packet that results in an error (e.g. a
> >> RST_STREAM) is read from the wire the application will not know about
> >> it until it attempts an IO operation, where it will receive an
> >> IOException it can handle however it wants. This is similar to the
> >> situation in HTTP/1.1, if the remote peer has sent a RST packet then
> >> the application will not know about it until it attempts IO.
> >>
> >> I think the spec should prohibit the container from arbitrarily
> >> calling sendError based on async IO events, as that way lies a
> >> minefield of threading issues that will basically require all state
> >> relating to the response to be thread safe with the associated
> >> performance penalty.
> >>
> >> Stuart
> >>
> >> >
> >> > The problem in more detail is that to generate an error page the
> >> > container:
> >> >
> >> > checks if the response is committed - if so abort connection as best
> as
> >> > possible (but may look like a normal EOF in HTTP/1.0)
> >> > clears the response buffer
> >> > sets the response status code/reason
> >> > sets some response attributes to describe the error
> >> > Either generates a page itself or does an DispatcherType.ERROR
> dispatch
> >> > to
> >> > the application to generate the error page.
> >> >
> >> > However in parallel the application may have async mechanisms that can
> >> > mutate the response by:
> >> >
> >> > write to the response
> >> > commit the response either by writing or by explicit flush
> >> > set the response status code/reason
> >> > call AsyncContext.complete() or other methods
> >> >
> >> > Now the handling for the last of these is well defined, because we can
> >> > have
> >> > an exact before/after relationship between the call to complete and
> the
> >> > decision to do an error dispatch.
> >> >
> >> > However, any of the other mutating methods may happen concurrently
> with
> >> > the
> >> > container/application generating an error response. So all sorts of
> >> > nasty
> >> > outcomes are possible: async write content in the middle of the error
> >> > page,
> >> > async status code sent instead of error page; application response
> >> > written
> >> > with error status but normal content, illegal state exception when
> >> > resetting
> >> > buffers because response committed asynchronously etc. etc. The exact
> >> > form
> >> > of the disaster is container dependent as the order things are done
> for
> >> > the
> >> > error page is not defined.
> >> >
> >> > Now currently this is not too much of a huge issue, as the kinds of
> >> > errors
> >> > that we may get are typically fatal to the connection/stream (eg
> >> > receiving
> >> > a HTTP/2 stream reset), so an error page is unlikely to be
> successfully
> >> > generated, but we still need to make the effort to inform the
> >> > application
> >> > that an error has occurred even if a page cannot be generated. Plus
> >> > also in
> >> > future there may be other error conditions from a HTTP/2 connection
> >> > violating some server imposed constraints that do require a error page
> >> > to be
> >> > generated.
> >> >
> >> > We could try to handle this with some kind of mutual exclusion on the
> >> > response methods, but generating an error page is a multi operation
> >> > process
> >> > that can call the application, so holding a lock while calling
> >> > application
> >> > code is just asking for problems.
> >> >
> >> > A better approach for Servlet 4.0 may be to discourage arbitrary
> threads
> >> > from being able to call the response API. Instead we could have a
> >> > variant
> >> > of AsyncContext.start(Runnable), that guarantees that the Runnable
> will
> >> > not
> >> > be run concurrently with any container dispatched thread on the
> >> > associated
> >> > transaction. We already give that guarantee for container callback
> and
> >> > dispatches, but it is impossible for an async thread to enforce non
> >> > concurrency with any container action and/or container managed
> >> > callbacks.
> >> >
> >> > So lets call the method AsyncContext.run(Runnable) and if called when
> no
> >> > container managed thread is active, then the Runnable would be called
> >> > immediately by the calling thread (in context scope like start),
> >> > otherwise
> >> > the Runnable would be queued to be called by a container managed
> thread
> >> > immediately after the container dispatch is complete and no other
> >> > container
> >> > dispatch would occur until after the Runnable returns.
> >> >
> >> > Thoughts
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Greg Wilkins <gregw@webtide.com> CTO http://webtide.com
> >
> >
> >
> >
> > --
> > Greg Wilkins <gregw@webtide.com> CTO http://webtide.com
>

-- 
Greg Wilkins <gregw@webtide.com> CTO http://webtide.com