Hi Ken,
Ken Cavanaugh wrote:
>
> On Jan 24, 2008, at 1:38 PM, Jeanfrancois Arcand wrote:
>
>> Hi,
>>
>> we recently faced a bug in Sailfin where performance significantly
>> decreased (cut&paste from Scott update on the bug):
>>
>>> This is not so much lock contention as it is a form of deadlock that
>>> happens
>>> under load. What happens is that all the SipContainer threads attempt
>>> to write
>>> to a socket (the sipp uas process); one of them will open the socket
>>> while the
>>> rest of them block waiting to enter the same code. The thread that
>>> opens a
>>> socket then waits for the socket connection to be completed.
>>> The completion of the socket connection is signaled by the
>>> TCPSelectorThread,
>>> which creates a callback task and executes it on the SipContainer
>>> thread pool.
>>> Since all of those threads are busy (blocked waiting to enter the
>>> socket-writing
>>> code), the callback isn't ever processed, and the stack trace below
>>> results.
>>> Eventually, the thread waiting for the callback will time out, and
>>> things will
>>> run for a few seconds until the same state is reached (and
>>> eventually, there
>>> will be sockets created and things will run okay).
>>> A possible fix for sailing is to override the TCPSelectorThread and
>>> execute the
>>> onConnect callback directly rather than putting that on another thread.
>>
>> I would like to propose that we change the current behavior and avoid
>> executing the OP_CONNECT on its own thread, but instead executing
>> using the SelectorHandler thread. This is what we are already doing
>> for OP_ACCEPT based on some performance measure we did for Grizzly
>> 1.0, and based on Scott's investigation, I would like to change the
>> default behavior
>>
>>> Index: TCPSelectorHandler.java
>>> ===================================================================
>>> --- TCPSelectorHandler.java (revision 712)
>>> +++ TCPSelectorHandler.java (working copy)
>>> @@ -697,8 +697,13 @@
>>> try {
>>> CallbackHandlerContextTask task =
>>> CallbackHandlerContextTask.poll();
>>> task.setCallBackHandler(callbackHandler);
>>> - context.execute(task);
>>> - } catch (PipelineFullException ex){
>>> + if (context.getCurrentOpType() !=
>>> Context.OpType.OP_CONNECT){
>>> + context.execute(task);
>>> + } else {
>>> + task.setContext(context);
>>> + task.call();
>>> + }
>>> + } catch (Exception ex){
>>> throw new IOException(ex.getMessage());
>>> }
>>> }
>>
>>
>> If someone rely on the current behavior (spawning a Thread), then the
>> TCPSelectorHandler can always be extended to act like previous version.
>>
>> I'm interested to learn from the Corba team about how OP_CONNECT has
>> been handled (if that was used).
>>
>>
>
> We currently only use blocking open, but this should change once we
> start working on the
> new connection cache for the ORB (using the one in Grizzly). Note that
> the ConnectionCache code
> does not handle non-blocking open, and this is the biggest improvement
> we need to
> make in that code. What needs to happen is a request to get a
> connection from the
> connection cache needs to initiate connection establishment, block the
> requesting
> thread, and then release the connection cache lock so that other
> connection requests
> may be processed by the cache (the current structure will even block
> access to
> cached connections until a connection request is completed). Then when
> a OP_CONNECT
> happens, the handler for the event needs to inform the connection cache
> that
> the connection establishment has completed (successfully or not).
>
> I think we may generally want to handle OP_CONNECT with a Selector
> thread callback,
> and just handle it directly, without using another thread, as the amount
> of processing
> needed is probably small. Of course, the architecture is flexible, and
> so we could
> either grab a thread to handle the OP_CONNECT, or possibly just enqueue the
> event to be processed on a threadpool.
>
> The connection cache will allow for multiple parallel connections to the
> same endpoint, so
> it should also initiate multiple connection establishment attempts
> concurrently. This should
> not add any additional problems, once someone (me? Harsha? Alexy?
> Charlie?) gets
> around to working on the connection cache again.
OK thanks for the feedback. Looks like we have a small problem with TLS
where the operations might be quite expensive. But I would think we
should consider TLS as an exception. So we could have:
TCP/UDP: OP_CONNECT on the same thread as the SelectorHandler
TLS: OP_CONNECT execute on its own thread
Thanks
-- Jeanfrancois
>
> Ken.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe_at_grizzly.dev.java.net
> For additional commands, e-mail: dev-help_at_grizzly.dev.java.net
>