[jsr356-experts] Re: Streaming API: was For Review: v002 API and example code

From: Scott Ferguson <ferg_at_caucho.com>
Date: Fri, 29 Jun 2012 17:19:32 -0700

On 06/29/2012 04:03 PM, Danny Coward wrote:
> Hi Scott, all,
>
> Thanks for the feedback, pls see below:-
>>
>
> OK. So we can certainly add streaming to process messages.
>
> But, are you suggesting using blocking Java i/o streams, like the
> servlet api, to represent that ? Something similar or equivalent to:-
>
> WebSocketListener -> public void onMessageAsInputStream(InputStream is)
> RemoteEndpoint -> public OutputStream startSendByOutputStream() ?

Yes. (Also with the equivalent Reader/Writer of course.)

It's necessary for the core use-case of serializing JSON, XML,
proto-buf, hessian, etc, messages, as well as for custom protocols like
STOMP over web-socket, or rewriting ZeroMQ over websocket. For example
XMPP over WebSocket is already a draft proposal.

Streams are also better for high-performance and for very large messages.

>
> And possibly exploring the type of additions that would allow
> non-blocking IO based on the traditional i/o streams as well, such as
> the servlet expert group has been looking at for servlet 3.1 ?

Possibly. I'd personally prefer nailing down the core blocking API first
before discussing the more complicated non-blocking APIs. I'd like to
match the servlet model as much as possible, while avoiding
javax.servlet dependencies.

>
>
> Although at least one of the APIs allows this, most APIs seem to favor
> a type of asynchronous processing same as or equivalent to:-
>
> WebSocketListener-> public void onMessage(byte[] fragment, boolean isLast)
> RemoteEndpoint-> public void send(byte[] fragment, boolean isLast)
>
> What are people's thoughts on standardizing this kind of chunking API ?

Well, let me unpack this because there are several intertwined issues
that should be separated:

1) The WebSocket frame/fragment is not an application-visible concept
(excluding extensions for the moment). The application-visible concept
in WebSocket is the message. In fact, the early IETF drafts only had
messages and no frames.

The fragment is supposed to be like TCP/IP frames. Although they exist,
applications can't use them or even really be aware of the boundaries.
It's for WebSocket protocol implementations and proxies to split/join
fragments as needed (and specifically for mux, which is a core websocket
implementation extension, not a user extension.)

2) Even for extensions, the IETF WebSocket Multiplexing group is finding
that frame-based extensions are a problem, because they interact in
difficult ways. There's a suggestion on one of their threads that
perhaps only the mux extension itself even be aware at all about frames,
and all other extensions work on messages.

So "fragment" is wrong for the application, and probably even extensions.

You could have "buffer" as in write(buffer, offset, length, isLast) just
to make it absolutely clear that the sending buffer has nothing to do
with websocket frames, but...

3) async sends are a problem because they imply queuing or at least
large buffering, and raise the question of who owns the buffer, memory
allocation, if there are extra copies just to handle the async, etc.
These aren't appropriate for the low level API. (A higher async
messaging/queuing on top of the low level API might be fine, but not the
lowest level.)

4) the "isLast" kind of API (assuming blocking) is functionally
equivalent to a stream, but someone would need to write a stream wrapper
if they're serializing xml, json, etc. Which isn't really something that
the API should force on an application.

5) async/isLast receives are truly messy if you're deserializing xml,
json, etc, because you either need to buffer the entire message (!)
before starting to deserialize, or create a complicated threaded
producer/consumer model to create a stream wrapper. Again, this is
fairly brutal to require of an application.

6) interaction with the multiplexing layer. There's a fairly good chance
that the multiplexing extension will be approved. If so, the core
messaging API should continue to work with mux exactly as-is without any
application changes.

Mux itself will need to refragment/block/buffer as necessary. So the
application buffers sends won't necessarily have any relation to the
actual frames once mux is done working with it.

... So basically, the async/chunk style APIs are a problem.

Although the base layer shouldn't be an async/chunked API, I'm all for
an async/messaging layer written on top of the simple blocking streaming
layer, if it's a general solution useful for a large class of applications.

-- Scott