Hi Grizzly devs,
I'm currently using Grizzly 2.1.7 as the IO framework for our LDAP SDK.
I've just noticed a serious problem in our client which occurs when
multiple client threads attempt to write large requests simultaneously to
the same connection (unlike my previous emails, I am not using SSL this
time, just plain TCP).
My client code looks a bit like the code below. I've redacted the encoding
a bit to make it easier, but the write to the connection is identical to
what we are doing:
// Serialize the LDAP request to a Grizzly Buffer.
LDAPSearchRequest request = ...;
LDAPWriter writer = new LDAPWriter();
writer.writeSearchRequest(request);
Buffer buffer = writer.getBuffer();
// Write the buffer content to the org.glassfish.grizzly.Connection
org.glassfish.grizzly.Connection connection = ...;
connection.write(buffer, null);
Our assumption was that the call to connection.write() does not need
synchronization. However, we are seeing messages becoming interleaved when
the buffer is relatively big (10KB). I enabled FINE logging which shows
what's going on (the client requests are always 10732 bytes):
FINE: TCPNIOConnection (TCPNIOConnection{localSocketAddress=/
10.10.0.113:43007, peerSocketAddress=my-server.local/1010.0.114:1389})
(plain) write 10,732 bytes
Feb 6, 2012 3:29:06 PM org.glassfish.grizzly.nio.transport.TCPNIOTransport
write
FINE: TCPNIOConnection (TCPNIOConnection{localSocketAddress=/
10.10.0.113:43007, peerSocketAddress=my-server.local/1010.0.114:1389})
(plain) write 10,136 bytes
Feb 6, 2012 3:29:06 PM org.glassfish.grizzly.nio.transport.TCPNIOTransport
write
FINE: TCPNIOConnection (TCPNIOConnection{localSocketAddress=/
10.10.0.113:43007, peerSocketAddress=my-server.local/1010.0.114:1389})
(plain) write 0 bytes
Feb 6, 2012 3:29:06 PM org.glassfish.grizzly.nio.transport.TCPNIOTransport
write
FINE: TCPNIOConnection (TCPNIOConnection{localSocketAddress=/
10.10.0.113:43007, peerSocketAddress=my-server.local/1010.0.114:1389})
(plain) write 0 bytes
Feb 6, 2012 3:29:06 PM org.glassfish.grizzly.nio.transport.TCPNIOTransport
write
FINE: TCPNIOConnection (TCPNIOConnection{localSocketAddress=/
10.10.0.113:43007, peerSocketAddress=my-server.local/1010.0.114:1389})
(plain) write 0 bytes
Feb 6, 2012 3:29:06 PM org.glassfish.grizzly.nio.transport.TCPNIOTransport
write
FINE: TCPNIOConnection (TCPNIOConnection{localSocketAddress=/
10.10.0.113:43007, peerSocketAddress=my-server.local/1010.0.114:1389})
(plain) write 10,732 bytes
Feb 6, 2012 3:29:06 PM org.glassfish.grizzly.nio.transport.TCPNIOTransport
write
FINE: TCPNIOConnection (TCPNIOConnection{localSocketAddress=/
10.10.0.113:43007, peerSocketAddress=my-server.local/1010.0.114:1389})
(plain) write 10,732 bytes
Feb 6, 2012 3:29:06 PM org.glassfish.grizzly.nio.transport.TCPNIOTransport
write
FINE: TCPNIOConnection (TCPNIOConnection{localSocketAddress=/
10.10.0.113:43007, peerSocketAddress=my-server.local/1010.0.114:1389})
(plain) write 596 bytes
Feb 6, 2012 3:29:06 PM org.glassfish.grizzly.nio.transport.TCPNIOTransport
write
FINE: TCPNIOConnection (TCPNIOConnection{localSocketAddress=/
10.10.0.113:43007, peerSocketAddress=my-server.local/1010.0.114:1389})
(plain) write 10,732 bytes
You can see that the first request is sent in one go, but the second
request requires several write attempts (3 writes writing 0 bytes), before
the remaining 596 bytes are written. However, the remaining bytes are
written *after* two other requests.
It's easy to see what is going wrong in the call-stack:
TCPNIOTemporarySelectorWriter(TemporarySelectorWriter).write0(NIOConnection,
SocketAddress, Buffer, WriteResult<Buffer,SocketAddress>, long, TimeUnit)
line: 167
TCPNIOTemporarySelectorWriter(TemporarySelectorWriter).write(Connection,
SocketAddress, Buffer,
CompletionHandler<WriteResult<Buffer,SocketAddress>>,
Interceptor<WriteResult<Buffer,SocketAddress>>, long, TimeUnit) line: 130
TCPNIOTemporarySelectorWriter(TemporarySelectorWriter).write(Connection,
SocketAddress, Buffer,
CompletionHandler<WriteResult<Buffer,SocketAddress>>,
Interceptor<WriteResult<Buffer,SocketAddress>>) line: 87
TCPNIOTemporarySelectorWriter(TemporarySelectorWriter).write(Connection,
Object, Buffer, CompletionHandler, Interceptor) line: 66
TCPNIOTemporarySelectorWriter(AbstractWriter<L>).write(Connection, L,
Buffer, CompletionHandler<WriteResult<Buffer,L>>) line: 90
TCPNIOTransportFilter.handleWrite(FilterChainContext) line: 151
TransportFilter.handleWrite(FilterChainContext) line: 200
ExecutorResolver$8.execute(Filter, FilterChainContext) line: 111
DefaultFilterChain.executeFilter(FilterExecutor, Filter,
FilterChainContext) line: 286
DefaultFilterChain.executeChainPart(FilterChainContext, FilterExecutor,
int, int) line: 223
DefaultFilterChain.execute(FilterChainContext) line: 155
DefaultFilterChain.process(Context) line: 134
ProcessorExecutor.execute(Context) line: 78
DefaultFilterChain.write(Connection, Object, Object, CompletionHandler)
line: 408
TCPNIOConnection(NIOConnection).write(SocketAddress, M,
CompletionHandler<WriteResult<M,SocketAddress>>) line: 338
TCPNIOConnection(NIOConnection).write(M,
CompletionHandler<WriteResult<M,SocketAddress>>) line: 330
Where TemporarySelectorWriter.write0() is looping without any
synchronization:
while (buffer.hasRemaining()) {
int len = writeNow0(connection, dstAddress, buffer,
currentResult);
...
}
I have a couple of questions:
- Is this a bug or should we be synchronizing on the call to
connection.write()? If we should synchronize then I find it a bit
surprising since the method's declaration looks like it should be thread
safe since it returns a Future and takes a CompletionHandler as a parameter.
- If this is a bug, why hasn't anyone else hit it? Are we using unusual
or deprecated APIs? I can understand why HTTP client implementations would
not run into this issue since the protocol is synchronous, but I'd imagine
other protocol implementations could hit this.
- I tried using Grizzly 2.1.8 with the same results. I also ran into a
separate minor issue as well with 2.1.8 in that it always creates a worker
thread pool even if you're using the SameThreadIOStrategy - the builder is
initialized with a WorkerThreadIOStrategy and setting the IOStrategy does
not clear the ThreadPoolConfig. In addition, forcefully setting the thread
pool size to 0 causes an exception later on when the FixedThreadPool is
created.
Cheers :-)
Matt