[jsr340-experts] Re: Async IO and Upgrade proposal updated

From: Greg Wilkins <gregw_at_intalio.com>
Date: Mon, 26 Mar 2012 09:56:37 +1100

On 23 March 2012 08:27, Shing Wai Chan <shing.wai.chan_at_oracle.com> wrote:
>
> In the WriteListener we have modified the onWritePossible method as follows
> -
> public void onWritePossible(int numOfBytes); -
> Note that the onWritePossible(int numOfBytes) may require buffering at the
> lower layers closer to where the
> actual write happens. An alternate is to add the various write methods to
> ServletOutputStream that returns an int with the number
> of bytes written just like the WriteableByteChannel's write method.

I don't understand how an implementation is meant to determine the
number of bytes that can be written without blocking. The same goes
for the associated canWrite(int size) method. If these methods are
just really referring to space within the output stream buffer, then
I have some usability and efficiency concerns with this approach.

Consider the example of a producer that wants to send an Object as
JSON asynchronously. It gets a callback saying 2048 bytes may be
written without blocking. Does it try to convert the object or not?
It does not know if 2048 bytes is sufficient. Let's say it knows
that it's objects are never larger than 8192 bytes, does it wait for a
callback telling it >8192 bytes may be written or does it have to
convert the object to a byte array and write it a chunk at a time?
Is the maximum value returned by canWrite or passed by onWritePossible
related to the value of Response.getBufferSize()?

If the producer decides that it is best to write the content a chunk
at a time, then it will have to create a byteArrayOutputStream and
convert the object to JSON and then bytes to discover the true size.
It can then write 2048 bytes as a chunk and wait for another callback
to write another chunk. However, I think this approach is really
inefficient as it a) forces an extra data copy in the implementation;
b) could result in a very stop/start/stop/start data flow on the wire;
c) bypasses the normal TCP/IP flow control mechanisms. It is far
better that once you have converted the object into a 8192 byte array
that you give the IO layer that byte array directly - without a copy -
and to let the TCP windows/flowcontrol etc to their stuff and write as
much as possible of that buffer.

Also, if we are buffering in the API, what are the async flush
semantics? If canWrite says 2048 bytes can be written and I write
1024, then do I need to flush to cause the write to actually happen?
If so, it will be difficult to write streaming content where you want
to write as much data as you can, but avoid sending needlessly small
packets.

I think we need to see some non trivial examples of the API being used
to really judge it's usability. Examples that write objects as JSON
or XML DOM documents etc.

regards