Re: Parallel writes revisited: bug or me?

From: Ryan Lubke <ryan.lubke_at_oracle.com>
Date: Mon, 06 Feb 2012 09:35:55 -0800

On 2/6/12 8:46 AM, Matthew Swift wrote:
> Hi Grizzly devs,
>
> I'm currently using Grizzly 2.1.7 as the IO framework for our LDAP
> SDK. I've just noticed a serious problem in our client which occurs
> when multiple client threads attempt to write large requests
> simultaneously to the same connection (unlike my previous emails, I am
> not using SSL this time, just plain TCP).
>
> My client code looks a bit like the code below. I've redacted the
> encoding a bit to make it easier, but the write to the connection is
> identical to what we are doing:
>
> // Serialize the LDAP request to a Grizzly Buffer.
> LDAPSearchRequest request = ...;
> LDAPWriter writer = new LDAPWriter();
> writer.writeSearchRequest(request);
> Buffer buffer = writer.getBuffer();
>
> // Write the buffer content to the org.glassfish.grizzly.Connection
> org.glassfish.grizzly.Connection connection = ...;
> connection.write(buffer, null);
>
> Our assumption was that the call to connection.write() does not need
> synchronization. However, we are seeing messages becoming interleaved
> when the buffer is relatively big (10KB). I enabled FINE logging which
> shows what's going on (the client requests are always 10732 bytes):
>
> FINE: TCPNIOConnection
> (TCPNIOConnection{localSocketAddress=/10.10.0.113:43007
> <http://10.10.0.113:43007>,
> peerSocketAddress=my-server.local/1010.0.114:1389}) (plain) write
> 10,732 bytes
> Feb 6, 2012 3:29:06 PM
> org.glassfish.grizzly.nio.transport.TCPNIOTransport write
> FINE: TCPNIOConnection
> (TCPNIOConnection{localSocketAddress=/10.10.0.113:43007
> <http://10.10.0.113:43007>,
> peerSocketAddress=my-server.local/1010.0.114:1389}) (plain) write
> 10,136 bytes
> Feb 6, 2012 3:29:06 PM
> org.glassfish.grizzly.nio.transport.TCPNIOTransport write
> FINE: TCPNIOConnection
> (TCPNIOConnection{localSocketAddress=/10.10.0.113:43007
> <http://10.10.0.113:43007>,
> peerSocketAddress=my-server.local/1010.0.114:1389}) (plain) write
> 0 bytes
> Feb 6, 2012 3:29:06 PM
> org.glassfish.grizzly.nio.transport.TCPNIOTransport write
> FINE: TCPNIOConnection
> (TCPNIOConnection{localSocketAddress=/10.10.0.113:43007
> <http://10.10.0.113:43007>,
> peerSocketAddress=my-server.local/1010.0.114:1389}) (plain) write
> 0 bytes
> Feb 6, 2012 3:29:06 PM
> org.glassfish.grizzly.nio.transport.TCPNIOTransport write
> FINE: TCPNIOConnection
> (TCPNIOConnection{localSocketAddress=/10.10.0.113:43007
> <http://10.10.0.113:43007>,
> peerSocketAddress=my-server.local/1010.0.114:1389}) (plain) write
> 0 bytes
> Feb 6, 2012 3:29:06 PM
> org.glassfish.grizzly.nio.transport.TCPNIOTransport write
> FINE: TCPNIOConnection
> (TCPNIOConnection{localSocketAddress=/10.10.0.113:43007
> <http://10.10.0.113:43007>,
> peerSocketAddress=my-server.local/1010.0.114:1389}) (plain) write
> 10,732 bytes
> Feb 6, 2012 3:29:06 PM
> org.glassfish.grizzly.nio.transport.TCPNIOTransport write
> FINE: TCPNIOConnection
> (TCPNIOConnection{localSocketAddress=/10.10.0.113:43007
> <http://10.10.0.113:43007>,
> peerSocketAddress=my-server.local/1010.0.114:1389}) (plain) write
> 10,732 bytes
> Feb 6, 2012 3:29:06 PM
> org.glassfish.grizzly.nio.transport.TCPNIOTransport write
> FINE: TCPNIOConnection
> (TCPNIOConnection{localSocketAddress=/10.10.0.113:43007
> <http://10.10.0.113:43007>,
> peerSocketAddress=my-server.local/1010.0.114:1389}) (plain) write
> 596 bytes
> Feb 6, 2012 3:29:06 PM
> org.glassfish.grizzly.nio.transport.TCPNIOTransport write
> FINE: TCPNIOConnection
> (TCPNIOConnection{localSocketAddress=/10.10.0.113:43007
> <http://10.10.0.113:43007>,
> peerSocketAddress=my-server.local/1010.0.114:1389}) (plain) write
> 10,732 bytes
>
>
> You can see that the first request is sent in one go, but the second
> request requires several write attempts (3 writes writing 0 bytes),
> before the remaining 596 bytes are written. However, the remaining
> bytes are written *after* two other requests.
>
> It's easy to see what is going wrong in the call-stack:
>
> TCPNIOTemporarySelectorWriter(TemporarySelectorWriter).write0(NIOConnection,
> SocketAddress, Buffer, WriteResult<Buffer,SocketAddress>, long,
> TimeUnit) line: 167
> TCPNIOTemporarySelectorWriter(TemporarySelectorWriter).write(Connection,
> SocketAddress, Buffer,
> CompletionHandler<WriteResult<Buffer,SocketAddress>>,
> Interceptor<WriteResult<Buffer,SocketAddress>>, long, TimeUnit)
> line: 130
> TCPNIOTemporarySelectorWriter(TemporarySelectorWriter).write(Connection,
> SocketAddress, Buffer,
> CompletionHandler<WriteResult<Buffer,SocketAddress>>,
> Interceptor<WriteResult<Buffer,SocketAddress>>) line: 87
> TCPNIOTemporarySelectorWriter(TemporarySelectorWriter).write(Connection,
> Object, Buffer, CompletionHandler, Interceptor) line: 66
> TCPNIOTemporarySelectorWriter(AbstractWriter<L>).write(Connection,
> L, Buffer, CompletionHandler<WriteResult<Buffer,L>>) line: 90
> TCPNIOTransportFilter.handleWrite(FilterChainContext) line: 151
> TransportFilter.handleWrite(FilterChainContext) line: 200
> ExecutorResolver$8.execute(Filter, FilterChainContext) line: 111
> DefaultFilterChain.executeFilter(FilterExecutor, Filter,
> FilterChainContext) line: 286
> DefaultFilterChain.executeChainPart(FilterChainContext,
> FilterExecutor, int, int) line: 223
> DefaultFilterChain.execute(FilterChainContext) line: 155
> DefaultFilterChain.process(Context) line: 134
> ProcessorExecutor.execute(Context) line: 78
> DefaultFilterChain.write(Connection, Object, Object,
> CompletionHandler) line: 408
> TCPNIOConnection(NIOConnection).write(SocketAddress, M,
> CompletionHandler<WriteResult<M,SocketAddress>>) line: 338
> TCPNIOConnection(NIOConnection).write(M,
> CompletionHandler<WriteResult<M,SocketAddress>>) line: 330
>
>
> Where TemporarySelectorWriter.write0() is looping without any
> synchronization:
>
> while (buffer.hasRemaining()) {
> int len = writeNow0(connection, dstAddress, buffer,
> currentResult);
>
> ...
> }
>
>
> I have a couple of questions:
>
> * Is this a bug or should we be synchronizing on the call to
> connection.write()? If we should synchronize then I find it a bit
> surprising since the method's declaration looks like it should be
> thread safe since it returns a Future and takes a
> CompletionHandler as a parameter.
>
This behavior is currently expected. However since both the
non-blocking and blocking writers share the same base interface and
having to care about synchronization depending on mode would be a pain,
we should fix this.
>
> *
>
>
> * If this is a bug, why hasn't anyone else hit it? Are we using
> unusual or deprecated APIs? I can understand why HTTP client
> implementations would not run into this issue since the protocol
> is synchronous, but I'd imagine other protocol implementations
> could hit this.
>
The connection you're using is blocking. That's the main difference
here I believe (I believe most are using non-blocking connections).
>
> *
>
>
> * I tried using Grizzly 2.1.8 with the same results. I also ran into
> a separate minor issue as well with 2.1.8 in that it always
> creates a worker thread pool even if you're using the
> SameThreadIOStrategy - the builder is initialized with a
> WorkerThreadIOStrategy and setting the IOStrategy does not clear
> the ThreadPoolConfig. In addition, forcefully setting the thread
> pool size to 0 causes an exception later on when the
> FixedThreadPool is created.
>
Ugh.

I've logged two issues to track this:

http://java.net/jira/browse/GRIZZLY-1190
http://java.net/jira/browse/GRIZZLY-1191

Thanks,
-rl
>
>
> Cheers :-)
>
> Matt
>