dev@grizzly.java.net

Proposal: Handling OP_CONNECT the same way we handle OP_ACCEPT (on the SelectorHandler thread)

From: Jeanfrancois Arcand <Jeanfrancois.Arcand_at_Sun.COM>
Date: Thu, 24 Jan 2008 16:38:39 -0500

Hi,

we recently faced a bug in Sailfin where performance significantly
decreased (cut&paste from Scott update on the bug):

> This is not so much lock contention as it is a form of deadlock that happens
> under load. What happens is that all the SipContainer threads attempt to write
> to a socket (the sipp uas process); one of them will open the socket while the
> rest of them block waiting to enter the same code. The thread that opens a
> socket then waits for the socket connection to be completed.
>
> The completion of the socket connection is signaled by the TCPSelectorThread,
> which creates a callback task and executes it on the SipContainer thread pool.
> Since all of those threads are busy (blocked waiting to enter the socket-writing
> code), the callback isn't ever processed, and the stack trace below results.
> Eventually, the thread waiting for the callback will time out, and things will
> run for a few seconds until the same state is reached (and eventually, there
> will be sockets created and things will run okay).
>
> A possible fix for sailing is to override the TCPSelectorThread and execute the
> onConnect callback directly rather than putting that on another thread.

I would like to propose that we change the current behavior and avoid
executing the OP_CONNECT on its own thread, but instead executing using
the SelectorHandler thread. This is what we are already doing for
OP_ACCEPT based on some performance measure we did for Grizzly 1.0, and
based on Scott's investigation, I would like to change the default behavior

> Index: TCPSelectorHandler.java
> ===================================================================
> --- TCPSelectorHandler.java (revision 712)
> +++ TCPSelectorHandler.java (working copy)
> @@ -697,8 +697,13 @@
> try {
> CallbackHandlerContextTask task = CallbackHandlerContextTask.poll();
> task.setCallBackHandler(callbackHandler);
> - context.execute(task);
> - } catch (PipelineFullException ex){
> + if (context.getCurrentOpType() != Context.OpType.OP_CONNECT){
> + context.execute(task);
> + } else {
> + task.setContext(context);
> + task.call();
> + }
> + } catch (Exception ex){
> throw new IOException(ex.getMessage());
> }
> }


If someone rely on the current behavior (spawning a Thread), then the
TCPSelectorHandler can always be extended to act like previous version.

I'm interested to learn from the Corba team about how OP_CONNECT has
been handled (if that was used).

A+

-- Jeanfrancois