[jsr340-experts] Re: Async IO and Upgrade proposal updated

From: Greg Wilkins <gregw_at_intalio.com>
Date: Thu, 29 Mar 2012 01:23:31 +1100

On 27 March 2012 17:12, Rajiv Mordani <rajiv.mordani_at_oracle.com> wrote:
>
>
> On 3/25/2012 3:56 PM, Greg Wilkins wrote:
>>
>> On 23 March 2012 08:27, Shing Wai Chan<shing.wai.chan_at_oracle.com> �wrote:
>>>
>>> In the WriteListener we have modified the onWritePossible method as
>>> follows
>>> -
>>> public void onWritePossible(int numOfBytes); -
>>> Note that the onWritePossible(int numOfBytes) may require buffering at
>>> the
>>> lower layers closer to where the
>>> actual write happens. An alternate is to add the various write methods to
>>> ServletOutputStream that returns an int with the number
>>> of bytes written just like the WriteableByteChannel's write method.
>>
>> I don't understand how an implementation is meant to determine the
>> number of bytes that can be written without blocking. �The same goes
>> for the associated canWrite(int size) method. If these methods are
>> just really referring to space within the output stream buffer, then
>> I have some usability and efficiency concerns with this approach.
>
>
>
> That's why we had "may require buffering at the lower layers". �That's why
> the
> other suggestion was to have the write methods return an int just like the
> WriteableByteChannel.

I think the design will definitely need buffers, because there is no
other way that we will be able to return a number that you can be
guaranteed will be writeable without blocking.
It is this extra layer of buffers that I think is a bad idea.

>> Consider the example of a producer that wants to send an Object as
>> JSON asynchronously. � It gets a callback saying 2048 bytes may be
>> written without blocking. � Does it try to convert the object or not?
>> It does not know if 2048 bytes is sufficient. � Let's say it knows
>> that it's objects are never larger than 8192 bytes, does it wait for a
>> callback telling it>8192 bytes may be written or does it have to
>> convert the object to a byte array and write it a chunk at a time?
>
>
> Yes you would have to convert it to a byte array and write it a chunk at a
> time. Just like
> http chunking that you have to do some buffering at the lower layers.

Ideally this is not the kind of thing an application should be doing.

We don't expect application programmers to do their own chunking
normally, so why must we require them to do so for async?

>>
>> If the producer decides that it is best to write the content a chunk
>> at a time, then it will have to create a byteArrayOutputStream and
>> convert the object to JSON and then bytes to discover the true size.
>> It can then write 2048 bytes as a chunk and wait for another callback
>> to write another chunk. � � � However, I think this approach is really
>> inefficient as it a) forces an extra data copy in the implementation;
>
>
> Not sure why it forces an extra data copy? If you use the ByteBuffer
> approach then you
> have the same issue where the byte buffer is allocated till the write is
> complete. Not
> sure how that is any different

This is not a bytebuffer vs byte[] thing. This is related to the
extra layer of buffering this design imposes.

If I have a 20k byte[] that I want to write, this design will require
that instead of doing

  write(buffer,0,20000);

I will have to do
  write(buffer,0,2000);
  // wait for callback
  write(buffer,2000,2000);
  // wait for callback
  ...
  write(buffer,18000,2000);

With the former style, the implementation can internally do something like

  len = channel.write(ByteBuffer.wrap(buffer,0,20000);

and so the user supplied data is passed directly to the channel write
call. If we allow ByteBuffers in the API, then that can even be a
direct buffer for a really efficient write. If the full buffer cannot
be written, it is the implementation not the application that has to
arrange for the rest to be written when the channel is writeable.

But instead, this design means that we have an internal buffer that we
have to copy into (because we've guaranteed to take at least 2000
bytes and we can't get that guarantee from the channel). So we have
to do

  internalBuf.put(buffer,0,2000);
  internalBuf.flip();
  channel.write(internalBuf);
  // next call
  internalBuf.put(buffer,2000,2000);
  internalBuf.flip();
  channel.write(internalBuf);
  // next call
  10 times ...

So we get an extra copy, lose the ability to pass in file mapped
buffers efficiently, 10 system calls instead of 1, and potentially a
less efficient TCP/IP flow control because of the smaller data
available.

>> b) could result in a very stop/start/stop/start data flow on the wire;
>> c) bypasses the normal TCP/IP flow control mechanisms. � �It is far
>> better that once you have converted the object into a 8192 byte array
>> that you give the IO layer that byte array directly - without a copy -
>> and to let the TCP windows/flowcontrol etc to their stuff and write as
>> much as possible of that buffer.
>
>
> Am not fully sure what you are suggesting here. We could use a ByteBuffer
> instead
> and provide a write(ByteBuffer) if that helps. Is that what you mean? But
> you are still
> buffering no matter what IMHO.

A ByteBuffer helps a little bit, because it gives the caller a
position index to help remember how much has been written, but I agree
it is not the solution.

I see two ways to resolve this:

a) remove the concept of how many bytes can be written without
blocking from the API. Just have a call back that says more data can
be written and return the bytes written from every call. This still
requires the application to track the partially written content, but
at least it removes the requirement to have internal buffering in the
implementation.

b) go to something like the Java7 NIO2 where you call
write(buffer,handler) and the handler is called back once the entire
buffer is written. No chunks, no copying, direct buffers! Any
luck getting the guys that designed NIO2 to come chat to us about
this?

>> I think we need to see some non trivial examples of the API being used
>> to really judge it's usability. �Examples that write objects as JSON
>> or XML DOM documents etc.
>
>
> Let me spend some more time trying to write one and get back to you on this.

that would be great!

cheers