[jsr340-experts] Re: Async IO and Upgrade proposal updated

From: Rajiv Mordani <rajiv.mordani_at_oracle.com>
Date: Thu, 29 Mar 2012 09:27:21 -0700

On 03/28/2012 07:23 AM, Greg Wilkins wrote:
> On 27 March 2012 17:12, Rajiv Mordani<rajiv.mordani_at_oracle.com> wrote:
>>
>> On 3/25/2012 3:56 PM, Greg Wilkins wrote:
>>> On 23 March 2012 08:27, Shing Wai Chan<shing.wai.chan_at_oracle.com> wrote:
>>>> In the WriteListener we have modified the onWritePossible method as
>>>> follows
>>>> -
>>>> public void onWritePossible(int numOfBytes); -
>>>> Note that the onWritePossible(int numOfBytes) may require buffering at
>>>> the
>>>> lower layers closer to where the
>>>> actual write happens. An alternate is to add the various write methods to
>>>> ServletOutputStream that returns an int with the number
>>>> of bytes written just like the WriteableByteChannel's write method.
>>> I don't understand how an implementation is meant to determine the
>>> number of bytes that can be written without blocking. The same goes
>>> for the associated canWrite(int size) method. If these methods are
>>> just really referring to space within the output stream buffer, then
>>> I have some usability and efficiency concerns with this approach.
>>
>>
>> That's why we had "may require buffering at the lower layers". That's why
>> the
>> other suggestion was to have the write methods return an int just like the
>> WriteableByteChannel.
> I think the design will definitely need buffers, because there is no
> other way that we will be able to return a number that you can be
> guaranteed will be writeable without blocking.
> It is this extra layer of buffers that I think is a bad idea.
>
>
>>> Consider the example of a producer that wants to send an Object as
>>> JSON asynchronously. It gets a callback saying 2048 bytes may be
>>> written without blocking. Does it try to convert the object or not?
>>> It does not know if 2048 bytes is sufficient. Let's say it knows
>>> that it's objects are never larger than 8192 bytes, does it wait for a
>>> callback telling it>8192 bytes may be written or does it have to
>>> convert the object to a byte array and write it a chunk at a time?
>>
>> Yes you would have to convert it to a byte array and write it a chunk at a
>> time. Just like
>> http chunking that you have to do some buffering at the lower layers.
> Ideally this is not the kind of thing an application should be doing.
>
> We don't expect application programmers to do their own chunking
> normally, so why must we require them to do so for async?
>
>
>>> If the producer decides that it is best to write the content a chunk
>>> at a time, then it will have to create a byteArrayOutputStream and
>>> convert the object to JSON and then bytes to discover the true size.
>>> It can then write 2048 bytes as a chunk and wait for another callback
>>> to write another chunk. However, I think this approach is really
>>> inefficient as it a) forces an extra data copy in the implementation;
>>
>> Not sure why it forces an extra data copy? If you use the ByteBuffer
>> approach then you
>> have the same issue where the byte buffer is allocated till the write is
>> complete. Not
>> sure how that is any different
> This is not a bytebuffer vs byte[] thing. This is related to the
> extra layer of buffering this design imposes.
>
> If I have a 20k byte[] that I want to write, this design will require
> that instead of doing
>
> write(buffer,0,20000);
>
> I will have to do
> write(buffer,0,2000);
> // wait for callback
> write(buffer,2000,2000);
> // wait for callback
> ...
> write(buffer,18000,2000);
>
> With the former style, the implementation can internally do something like
>
> len = channel.write(ByteBuffer.wrap(buffer,0,20000);
>
> and so the user supplied data is passed directly to the channel write
> call. If we allow ByteBuffers in the API, then that can even be a
> direct buffer for a really efficient write. If the full buffer cannot
> be written, it is the implementation not the application that has to
> arrange for the rest to be written when the channel is writeable.
>
> But instead, this design means that we have an internal buffer that we
> have to copy into (because we've guaranteed to take at least 2000
> bytes and we can't get that guarantee from the channel). So we have
> to do
>
> internalBuf.put(buffer,0,2000);
> internalBuf.flip();
> channel.write(internalBuf);
> // next call
> internalBuf.put(buffer,2000,2000);
> internalBuf.flip();
> channel.write(internalBuf);
> // next call
> 10 times ...
>
> So we get an extra copy, lose the ability to pass in file mapped
> buffers efficiently, 10 system calls instead of 1, and potentially a
> less efficient TCP/IP flow control because of the smaller data
> available.
>
>
>
>>> b) could result in a very stop/start/stop/start data flow on the wire;
>>> c) bypasses the normal TCP/IP flow control mechanisms. It is far
>>> better that once you have converted the object into a 8192 byte array
>>> that you give the IO layer that byte array directly - without a copy -
>>> and to let the TCP windows/flowcontrol etc to their stuff and write as
>>> much as possible of that buffer.
>>
>> Am not fully sure what you are suggesting here. We could use a ByteBuffer
>> instead
>> and provide a write(ByteBuffer) if that helps. Is that what you mean? But
>> you are still
>> buffering no matter what IMHO.
> A ByteBuffer helps a little bit, because it gives the caller a
> position index to help remember how much has been written, but I agree
> it is not the solution.
>
>
> I see two ways to resolve this:
>
> a) remove the concept of how many bytes can be written without
> blocking from the API. Just have a call back that says more data can
> be written and return the bytes written from every call. This still
> requires the application to track the partially written content, but
> at least it removes the requirement to have internal buffering in the
> implementation.

Ok we have been discussing this also internally - where we remove the
parameter in
the canWrite method - so the method signature would be

boolean canWrite();

and of course I already suggested that we have the write methods return
an int which
indicates. So we will do that.

- Rajiv
>
> b) go to something like the Java7 NIO2 where you call
> write(buffer,handler) and the handler is called back once the entire
> buffer is written. No chunks, no copying, direct buffers! Any
> luck getting the guys that designed NIO2 to come chat to us about
> this?
>
>
>
>>> I think we need to see some non trivial examples of the API being used
>>> to really judge it's usability. Examples that write objects as JSON
>>> or XML DOM documents etc.
>>
>> Let me spend some more time trying to write one and get back to you on this.
> that would be great!
>
>
> cheers