Re: Async Writes & recommendations

From: D. J. Hagberg (Sun) <"D.>
Date: Wed, 13 Jun 2007 16:41:41 -0600

Thanks for the questions, Robert -- all good I think.

Robert Greig wrote:
> On 13/06/07, D. J. Hagberg (Sun) <Dj.Hagberg_at_sun.com> wrote:
>> It seems to me that tying up precious WorkerThread resources waiting for
>> synchronous write's to complete (where one is not bound by the
>> javax.servlet specification) is a bit wasteful and limiting in
>> scalability, especially if I have a client on the other end of a slow
>> connection, but I don't have any hard numbers to back me up on this.
>
> I work on the Apache Qpid project, which is an implementation of the
> AMQP protocol (for message oriented middleware). We don't use Grizzly
> yet but I have spent a lot of time looking at this area for Qpid.

It sounds like the protocols and goals may be similar. Basically the
Shared Shell server acts as a rendesvous point for messaging among
clients interested in a particular "topic" (in this case a VT100-type
shell session on a server).

> What is your primary goal? Is it to maximise the number of connections
> that can be serviced concurrently or to maximise the data throughput
> to clients? Also, is it important for clients to be able to read data
> as they are sending it or is it more like http where they send then
> receive?

The goals are:
1. To keep all the content un-snoopable and ensure no man-in-the-middle
attacks. So we're using SSL over TCP.
2. To scale to a "reasonable" number of concurrent connections
per-server, say 2000-4000.
3. To minimize latency in the round-trip times -- being a terminal style
client application, seeing keystroke results in sub-second time is expected.

> Are you in control of the client code as well as the server? One thing
> we had to consider was memory usage - i.e. ensuring that the queues
> did not get too big either due to badly written or just slow clients.

Yes, we control both clients and servers -- the lower-level parts of the
code are shared between client and server, and our protocol has a
"revision" number at the start of every message. This rewrite is going
to bump the message revision number anyway, creating an incompatible
change. Clients will auto-update with Java Web Start so it's not a big
deal to keep the two in sync.

The only constraint is that our client code still needs to support JDK
1.4.2. Our Server code, which is what I am primarily concerned with
right now, will be using 1.5 or 1.6.

>> The other possibility would seem to be to *only* register an OP_WRITE
>> interest when there are messages in the outbound queue that need to be
>> written, triggered whenever a message is added to the outbound queue.
>> But in this case, there is an expense involved with waking up the
>> selector thread and registering/updating selection keys, etc.
>
> In our design, we have several I/O threads, half of which are
> responsible for reading and half for writing. Connections are assigned
> a thread on a round robin basis.

It looks like the Grizzly code has a single Selector thread for the
whole instance, then delegates to WorkerThread's to do the actual
reading and writing. I'm pretty sure I want to split read and write
work for the exact reasons you mention (slow clients, possibly better
throughput).

> There is certainly an expense involved with waking up the thread.
> However, OP_WRITE interest only needs to be registered when the kernel
> buffer is full, and this is obviously partly dependent on how quickly
> your clients are processing data.

Makes sense.

> The first design we had, threads were responsible for both reads and
> writes but we wanted to be able to read and write from a socket at the
> same time. We found it significantly reduced the memory usage of our
> app since the build-up in the queues was far lower.

Again, I think we are on similar tracks here -- our clients can read &
write at the same time and expect to process every message
asynchronously. The "old" design that I am chucking out here actually
had a TON of threads to handle this with old-school blocking I/O and
wait/notify on message and work queues.

> Sorry for providing more questions than answers but I think this is
> such a "delicate" area.

Agreed, and it makes a significant difference on the nature of the
protocol. Plus, it sounds like both our applications have a behavior
that is very much divergent from HTTP's typical request/response/close
behavior. Interesting, though, that AJAX is moving (abusing?) HTTP to
more of an asynch. messaging protocol...

Thanks for the discussion. I'll take a look at the ideas in Qpid/AMQP
when I get a chance.

-=- D. J.