[jsr340-experts] Re: Proposal for WebSocket to be part of JSR 340

From: Greg Wilkins <gregw_at_intalio.com>
Date: Sun, 9 Oct 2011 09:26:59 +1100

On 7 October 2011 20:36, Remy Maucherat <rmaucher_at_redhat.com> wrote:
> On Fri, 2011-10-07 at 09:13 +1100, Greg Wilkins wrote:
>> This kind of thing has been extensively discussed in the working
>> group. The reason that frames are not exposed to the browser API and
>> are not intended to be exposed on the server side is precisely because
>> it is NOT the intention of websockets to allow semantic meaning to be
>> attached to any boundary less than a message. If an application
>> wants to handle "chunks" of data, then they need to send each chunk as
>> a separate WS message.
>
> Ewwww, ugly ;) I didn't remember reading these warnings in the spec
> document though, so are you sure this is really clear for all websocket
> protocol developers out there ?
>
> In HTTP land, although a transfer encoding like chunking cannot be used
> for internal content delimitation in theory, in practice it works and
> it's awfully convenient when you use something like the Servlet 3.1 IO.

Remy,

I actually don't think it is any different to HTTP. HTTP chunks are
pretty much equivalent to WebSocket frames, as each can fragment a
large message and both are entirely transparent to the application.
HTTP makes no delivery guarantee on any boundaries other than the
entire message, so any application semantics that are based on
boundaries within an HTTP message are entirely hacks that just happen
to work with todays implementations (eg comet streaming)
[Note HTML5 Server sent events is an attempt to change this, but I'm
dubious if it is really a proper use of HTTP].

So having a streaming API for websockets would enable similar
applications to attempt to interpret meaning to partially received
data, but it would still be a hack and WS implementations are
perfectly entitled to buffer entire messages (maybe to disk) before
commencing to stream data to the application.

So we cannot allow any arguments for streamed APIs that are based on
handling partially received messages - this is just incorrect
semantics.
The "applications must break up their data into small messages" only
applies if the application wants to put semantic meaning on small
parts of their data - or wants to avoid the arbitrary size limits
imposed by browsers.

The only argument I see for streamed APIs is that it is a good way to
avoid the buffer size issues inherent in a message based API.
This is a reasonably good argument... so long as we are reasonable
sure that we will actually see such large messages being
sent/received. Given that arbitrary limits exist in browsers, I'm
dubious that we will see many... but I do agree that 16MB is already
too large for servers.

So having a streamed API is OK. But given that arbitrary limits
already exist for both HTTP and WS, if we want a simpler API, then I
think we can solve the 99% case by having just a message based API
with an application configurable max message size.

cheers