[jsr369-experts] Re: [servlet-spec users] Re: Re: Re: Race between error page dispatch and async servlet

From: Stuart Douglas <sdouglas_at_redhat.com>
Date: Wed, 26 Apr 2017 08:35:26 +1000

On Sat, Apr 22, 2017 at 10:53 AM, Greg Wilkins <gregw_at_webtide.com> wrote:
>
> All,
>
> I know this is very late in the process to bring up this one again..... but
> while the particular race that I discussed is solvable there are still a
> large range of very difficult races that can occur in fully async
> applications.
>
> The container provides mutual exclusion for all threads calling filters,
> servlets, AsyncListeners, ReadListener and WriteListeners. There can only
> ever be one thread dispatched to one of these handlers and thus a single
> order can be known and handled.
>
> However, there is no such requirement on applications threads that call
> AsyncContext.start(Runnable). There can be multiple of those active at
> once and concurrently with container managed threads. This puts a huge
> burden on async applications to provide mutual exclusion and can often
> result in too much locking and blocking (which is not what you want for
> async apps).
>
> Thus I still think there is great value in a variant of
> AsyncContext.start(Runnable) that will run the task in a container managed
> thread that is mutually excluded from all container dispatches associated
> with the request.
>
> Writing asynchronous applications is vastly simpler if you can know there is
> only ever 1 thread active for a given request. So I'd like to propose that
> we add to AsyncContext:
>
>
> /**
> * Causes the container to dispatch a thread, possibly from a managed
> * thread pool, to run the specified <tt>Runnable</tt>. The container
> may
> * propagate appropriate contextual information to the
> <tt>Runnable</tt>.
> * The container will guarantee that the thread will not be run
> concurrently
> * with any other thread dispatched by the container for the associated
> * request for any invocation of the standard API including other
> invocations
> * of this method.
> *
> * @param run the asynchronous handler, which is assumed to be a short
> lived,
> * preferably non blocking, method.
> */
> public void startSingleThreaded(Runnable run);
>
>
> OK the name sucks

I agree that this would be very useful, although I don't like the name
(but I can't think of a better one off the top of my head).
With regards to the 'which is assumed to be a short lived, preferably
non blocking, method.' part are you expecting that this will be run in
an IO thread? Otherwise I don't really see any need for this
restriction.

Stuart

>
>
>
>
>
> On 6 April 2017 at 15:43, Stuart Douglas <sdouglas_at_redhat.com> wrote:
>>
>> On Thu, Apr 6, 2017 at 3:14 PM, Greg Wilkins <gregw_at_webtide.com> wrote:
>> > Stuart,
>> >
>> > Ah good idea to add such an IO event to the input queue so it occurs
>> > once an
>> > IO operation is attempted. This would indeed help with the most
>> > immediate
>> > manifestation of the problem we have.
>> >
>> > However, many of our http/2 users are interested in fail fast semantics
>> > that
>> > are available with HTTP/2 and we have discussed here how such errors can
>> > be
>> > delivered.... perhaps the can be delivered to any AsyncListener onError
>> > if
>> > registered and then in any subsequent IO operations.
>>
>> I think this may need to be a new listener, as this is a new semantic
>> with regards to thread safety that may cause problems.
>>
>> In particular at the moment onError is only called after a dispatch()
>> call, so the Request/Response objects are basically 'owned' by the
>> thread doing the dispatch, and the onError call will be called by that
>> thread (there could still be other user threads active, however then
>> it is the users responsibility to synchronize access to the
>> Request/Response objects).
>>
>> Invoking the onError method in a different thread due to connection
>> failure could break existing listeners that are not coded in a thread
>> safe manner.
>>
>> In addition to this every mechanism by which we can get a 'fail fast'
>> result implies that the underlying connection is broken and no
>> response can be sent. As a result it is pointless to generate an error
>> page, which I think most existing onError implementation will be
>> trying to do.
>>
>> If users want fail fast support I think we need to explicitly add
>> support for it, otherwise failures should be reported when the user
>> attempts IO as normal (with the potential for container specific
>> support for fail fast).
>>
>> Stuart
>>
>> >
>> > But note that even if we so handle such IO events, I think it is still
>> > possible for the problem to occur when handling an exception thrown from
>> > a
>> > dispatch to a servlet. The container still needs to generate an error
>> > page,
>> > but the application may have already commenced an async write mechanism
>> > that
>> > can race with the error page.
>>
>> In this case I do not start the async write process until the
>> dispatch() call stack has returned, which side steps this problem.
>>
>> Stuart
>>
>> >
>> > cheers
>> >
>> >
>> >
>> >
>> > On 6 April 2017 at 14:05, Stuart Douglas <sdouglas_at_redhat.com> wrote:
>> >>
>> >> On Thu, Apr 6, 2017 at 1:37 PM, Greg Wilkins <gregw_at_webtide.com> wrote:
>> >> >
>> >> > We are trying to deal with a race condition in Jetty that we believe
>> >> > is
>> >> > kind
>> >> > of inherent in the spec rather than our handling of it. It is more
>> >> > apparent with HTTP/2 usage but I think that is can also happen with
>> >> > HTTP/1
>> >> > is very specific circumstances. I'm wondering of other containers
>> >> > have
>> >> > encountered this problem and perhaps if there is something we can do
>> >> > to
>> >> > help
>> >> > avoid it.
>> >> >
>> >> > The issue is that if the application is using async to generate a
>> >> > response
>> >> > from a non container thread, then there is an intrinsic race
>> >> > condition
>> >> > if an
>> >> > error occurs and an error page needs to be generated.
>> >> >
>> >> > On HTTP/1 and error page would only need to be generated if a prior
>> >> > dispatch/callback throws an exception. With HTTP/2 error pages
>> >> > dispatches
>> >> > can occur at any time because of a packet that has been received over
>> >> > the
>> >> > stream (which is continuously being read).
>> >>
>> >> Unless I have misunderstood what you mean this is not possible in our
>> >> implementation. If a packet that results in an error (e.g. a
>> >> RST_STREAM) is read from the wire the application will not know about
>> >> it until it attempts an IO operation, where it will receive an
>> >> IOException it can handle however it wants. This is similar to the
>> >> situation in HTTP/1.1, if the remote peer has sent a RST packet then
>> >> the application will not know about it until it attempts IO.
>> >>
>> >> I think the spec should prohibit the container from arbitrarily
>> >> calling sendError based on async IO events, as that way lies a
>> >> minefield of threading issues that will basically require all state
>> >> relating to the response to be thread safe with the associated
>> >> performance penalty.
>> >>
>> >> Stuart
>> >>
>> >> >
>> >> > The problem in more detail is that to generate an error page the
>> >> > container:
>> >> >
>> >> > checks if the response is committed - if so abort connection as best
>> >> > as
>> >> > possible (but may look like a normal EOF in HTTP/1.0)
>> >> > clears the response buffer
>> >> > sets the response status code/reason
>> >> > sets some response attributes to describe the error
>> >> > Either generates a page itself or does an DispatcherType.ERROR
>> >> > dispatch
>> >> > to
>> >> > the application to generate the error page.
>> >> >
>> >> > However in parallel the application may have async mechanisms that
>> >> > can
>> >> > mutate the response by:
>> >> >
>> >> > write to the response
>> >> > commit the response either by writing or by explicit flush
>> >> > set the response status code/reason
>> >> > call AsyncContext.complete() or other methods
>> >> >
>> >> > Now the handling for the last of these is well defined, because we
>> >> > can
>> >> > have
>> >> > an exact before/after relationship between the call to complete and
>> >> > the
>> >> > decision to do an error dispatch.
>> >> >
>> >> > However, any of the other mutating methods may happen concurrently
>> >> > with
>> >> > the
>> >> > container/application generating an error response. So all sorts of
>> >> > nasty
>> >> > outcomes are possible: async write content in the middle of the error
>> >> > page,
>> >> > async status code sent instead of error page; application response
>> >> > written
>> >> > with error status but normal content, illegal state exception when
>> >> > resetting
>> >> > buffers because response committed asynchronously etc. etc. The
>> >> > exact
>> >> > form
>> >> > of the disaster is container dependent as the order things are done
>> >> > for
>> >> > the
>> >> > error page is not defined.
>> >> >
>> >> > Now currently this is not too much of a huge issue, as the kinds of
>> >> > errors
>> >> > that we may get are typically fatal to the connection/stream (eg
>> >> > receiving
>> >> > a HTTP/2 stream reset), so an error page is unlikely to be
>> >> > successfully
>> >> > generated, but we still need to make the effort to inform the
>> >> > application
>> >> > that an error has occurred even if a page cannot be generated. Plus
>> >> > also in
>> >> > future there may be other error conditions from a HTTP/2 connection
>> >> > violating some server imposed constraints that do require a error
>> >> > page
>> >> > to be
>> >> > generated.
>> >> >
>> >> > We could try to handle this with some kind of mutual exclusion on the
>> >> > response methods, but generating an error page is a multi operation
>> >> > process
>> >> > that can call the application, so holding a lock while calling
>> >> > application
>> >> > code is just asking for problems.
>> >> >
>> >> > A better approach for Servlet 4.0 may be to discourage arbitrary
>> >> > threads
>> >> > from being able to call the response API. Instead we could have a
>> >> > variant
>> >> > of AsyncContext.start(Runnable), that guarantees that the Runnable
>> >> > will
>> >> > not
>> >> > be run concurrently with any container dispatched thread on the
>> >> > associated
>> >> > transaction. We already give that guarantee for container callback
>> >> > and
>> >> > dispatches, but it is impossible for an async thread to enforce non
>> >> > concurrency with any container action and/or container managed
>> >> > callbacks.
>> >> >
>> >> > So lets call the method AsyncContext.run(Runnable) and if called when
>> >> > no
>> >> > container managed thread is active, then the Runnable would be called
>> >> > immediately by the calling thread (in context scope like start),
>> >> > otherwise
>> >> > the Runnable would be queued to be called by a container managed
>> >> > thread
>> >> > immediately after the container dispatch is complete and no other
>> >> > container
>> >> > dispatch would occur until after the Runnable returns.
>> >> >
>> >> > Thoughts
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Greg Wilkins <gregw@webtide.com> CTO http://webtide.com
>> >
>> >
>> >
>> >
>> > --
>> > Greg Wilkins <gregw@webtide.com> CTO http://webtide.com
>
>
>
>
> --
> Greg Wilkins <gregw@webtide.com> CTO http://webtide.com