jsr340-experts@servlet-spec.java.net

[jsr340-experts] Re: Async IO and Upgrade proposal updated

From: Remy Maucherat <rmaucher_at_redhat.com>
Date: Fri, 30 Mar 2012 12:35:34 +0200

On Fri, 2012-03-30 at 09:56 +1100, Greg Wilkins wrote:
> On 29 March 2012 20:18, Remy Maucherat <rmaucher_at_redhat.com> wrote:
> > Looking about this new reply makes me doubt your sincerity a bit.
>
> As always your attitude and inability (or unwillingness) to comprehend
> the point of view of others makes it very tiresome to deal with you.
> But in most previous cases you have eventually seen the light after
> many months of repetition, so I will persist.

The problem is that in this new email, you did skew my example again in
the direction that suits you, despite basic logic.

> > I keep the rest of the buffer, say it was written (so that the
> > application does not have to deal with it),
>
> Firstly you can't just keep the rest of the buffer, as you have no
> idea as to it's volatility or otherwise after you return from the
> write method. You have to copy the rest of the buffer.

Yes, the rest of the buffer is copied. It is a possible cost, there's no
denying that, but it allows reuse of the application buffer which
mitigates it. With NIO 2, the buffer is given to the container and the
application has to use a new one.

> > and flip the canWrite flag
> > to false (so that the application knows it should stop writing until
> > told otherwise using a callback). When it is possible to write without
> > blocking (so when the socket polling says so) the remaining bytes will
> > be written, the canWrite flag is flipped back to true, and the
> > application will get its notification. There is NO internal backlogging.
>
> Sure you can try to write the passed buffer directly to the channel,
> and then only copy the remaining data. But producers and consumers
> rarely run at the same speed so either you will be producing faster
> than the network can consume, in which case the copy will be frequent,
> or the production is slower, in which case it is made even more
> inefficient by the need to pass in the data in smaller chunks than the
> network could otherwise consume.

NIO 2 will have to wait too when faced by slow clients. I'll get to the
buffer size a little bit later.

> Besides, it is much better to just let the app pass the entire content
> buffer and then try to write as much as possible. The app can then
> keep the unwritten data and it knows the volatility of the buffer, so
> it can copy it or not as needed.

No it's not better, it is much worse actually. An application using big
buffers is an easy DoS target. All you need is to read slowly and the
container has to sit on its buffer until the write is complete.

I thought it was well accepted that "small" buffers are the way to go.
Servlet has an 8KB buffer today, what is the problem with that kind of
size ? Except your proposed design is inefficient with such size of
course.

If you're using a 100KB (or bigger ?) buffer to produce all the content
in a small number of writes, that you expect to be sent asynchronously,
you're going to use that 100KB per connection just for writing data. If
the client is honest and fast, it will read quickly, and the problem
might go away fast enough. OTOH, slow clients will cause memory use to
rise fast, leading to a DoS situation.

> > I can give an example for better understanding:
> > while (os.canWriteCount() > 4KB) {
> > os.write(my4KBbuffer);
> > os.flush();
> > }
> > // write callback
> > while (os.canWriteCount() > 4KB) {
> > os.write(my4KBbuffer);
> > os.flush();
> > }
> > // write callback
> > while (os.canWriteCount() > 4KB) {
> > os.write(my4KBbuffer);
> > os.flush();
> > }
>
> Well that's ugly and inefficient! The application has to both break
> the data up into little consumable chunks and be prepared to write it
> either in a loop or via call back style.

You can increase the buffer size if you like, but evidently it uses more
resources per connection. I thought the point of advanced IO techniques
was to make per connection memory use as low as possible, and that
bidirectional interactive protocols are best handled with small buffers.
How with large buffers can you expect to have 100k connections active ?
Each with a 100KB (1MB ?) buffer ? Besides that issue, the buffers would
be easier to reuse, while the usual classic design simply reuses one
"small" buffer.

Let's pretend that my example was with 20KB buffers then, that remains
reasonable.

> It is a needless complication to limit the amount that can be written
> to the amount that the implementation is prepared to buffer in the
> event that the network is blocked. It is far more efficient and far
> simpler to do either:

There's no artificial limit in the API design. But the bigger the
buffer, the more data the container will have on its hand to keep
around, just like with NIO 2. Forcing the container to deal with that is
costly in both cases.

> os.write(my20KBuffer, completionhandler);
>
> or
>
> os.write(my20KBuffer); // 8k of the 20k is written
> // write callback
> os.write(my20KBuffer); // next 8k of the 20k is written
> // write callback
> os.write(my20KBuffer); // remaining 4k is written

I can do a single os.write(my20KBuffer); too ...

> > The flush will cause the actual write, since the Servlet layer has its
> > own buffer, I'm using it for clarity so that it is known when it occurs.
>
> But you were just saying that you would try to directly write the
> content passed in the write and only buffer it it was unwriteable?
> So I don't see what the flush is doing - either the data was already
> written, or it was buffered because it could not be written.

The flush in the example is to indicate where the actual write occurs.
This is due to the servlet buffer, that is 8KB. BTW, Rajiv's proposal
still uses that infrastructure, I believe, while yours simply bypasses
it ?

Let's redo the example then with the boolean flag:
while (os.canWrite()) {
   os.write(my16KBbuffer);
}
// Wait for callback
while (os.canWrite()) {
   os.write(my16KBbuffer);
}

> > Apparently, your issue with Rajiv's proposal is that it is possible to
> > implement it inefficiently,
>
> No - my problem is that it requires the implementation to provide an
> un-knowable number - how many bytes can be written without blocking.
> Thus implementations will have to guess a number that is likely to be
> much less than what can be written (or if it is large will contribute
> to buffer bloat). This will constrain applications to chunk their
> own content and always write less that could actually be written.

So Rajiv has removed that now. The main problem is that it is hard to
give an exact number (in my implementation, it would be based on the
amount of bytes remaining in the socket buffer, but since it is after
encoding like chunking, I can't give an exact number). OTOH, forcing the
application to use "small" buffers per connection is necessary.

> > while your proposal is always inefficient.
>
> You have not established that. Both proposals that I have advocated
> have less data copying, less system calls, larger writes and fewer EE
> wrapped callbacks than the current proposal.

I did. Your proposal, to be efficient, will require large buffers,
putting a hard cap on scalability, if every connection needs 10 times
the memory or more for its processing.

Sure, if you buffer 1MB and then use a NIO 2 call, there's one EE
callback. I can do the same thing too. Problem is, the server will not
scale and will be dead.

> > Of course, if both were equally inefficient, then the NIO 2 API style is
> > simpler and should be chosen. But that's really not the case.
> >
> > Last, and you may have missed it, it has been clarified that each of
> > these callbacks should have the full EE environment set. So in addition
> > to the trip to the thread pool, they will have a cost, and being able to
> > minimize them is good.
>
> Exactly! In this example the NIO2 style has the least callbacks.
> Both canwrite as boolean and canwrite as int can have the same number
> of callbacks, but canwrite as int requires the application to
> implement a complex loop inside a callback aproach.

No, the NIO 2 has one callback per write. So it has more callbacks, it
is simple logic.

> > Simple read/write boolean flags are probably enough, and that's what I
> > have right now. My problem with your argumentation is that using an int
> > doesn't really change anything to efficiency (the boolean is actually
> > still there anyway if the application prefers it !), it simply gives an
> > extra information to the application if, as in my implementation, it
> > wants to avoid using leftovers (the extra copy you didn't like). So what
> > is the problem ?
>
> The problem is that the int is an artificially created limit, above
> which the application cannot write. Instead of saying
> write(bigbuffer) and letting the impl take as much data as possible,
> the app has to loop doing while(canWrite)write(chunk) for absolutely
> no gain.

Well, that guarantees lowest memory use, no extra operations in the
container. Maximized scalability. You cannot do write(bigbuffer) and
expect to scale or perform well. Why not simply use classic blocking IO,
it's only one thread after all, so probably less memory that bigbuffer ?

> Finally, I have no idea why you feel the need to question my sincerity
> or motivations for discussing the various options. Even if I had been
> wrong in the points that I raised, perhaps I had misunderstood the
> proposed design and this could have been pointed out in polite dialog.
> We need to get this right and there is no harm taking a bit of time
> to discuss the actual usage and implementation of the proposed
> designs. Instead you appear extremely annoyed at any suggestion that
> might take the design away from what you have already implemented -
> perhaps it is you that has the agenda?

I am fine if Rajiv's design is not adopted, I will have no problem
implementing your proposed design from a technical standpoint. It will,
however, be less efficient (I doubt its usefulness over java.io in
general cases, actually), and I will have to advertise my proprietary
API instead. So lots of hassle I would prefer to avoid.

Also, there is nothing preventing adding an extra NIO 2 write method,
since it has uses with some types of big buffers (the mapped ones). But
that's a lot of work for specialty use.

> We can't have a process that is a) design proposed b) Remy says he has
> implemented it c) finished!

In this process, Rajiv apparently came up independently with the same
design I implemented a couple years ago [in its final form integrated
with Servlet 3.0]. It's not my fault ... Should I lie and say I don't
have it implemented already ? (well, actually, I don't, although the
design ideas are the same, the API form is quite different, I'll have
significant changes to switch to Rajiv's listeners)

BTW, you seem most interested in pushing a solution that would require
the least changes in your container, but has major caveats.

To summarize, I think the point is to allow very high scalability (100k
connections ?) of interactive bidirectional protocols, it should not be
simply a way to send big content asynchronously. After testing it, NIO 2
works well for that kind of use too, but it needs fancier design
(scatter/gather) and its callbacks need to be cheap, which is not the
case in our EE land.

> I note that few others are joining discussion in the EG and I
> suggest that it is your caustic attitude towards alternate points of
> view that is a disincentive to others asking questions or
> contributing.

Sorry for pointing out facts.

-- 
Remy Maucherat <rmaucher_at_redhat.com>
Red Hat Inc