[jsr356-experts] Re: RemoteEndpoint setAutoFlush() and flush()

From: Danny Coward <danny.coward_at_oracle.com>
Date: Fri, 14 Dec 2012 17:13:04 -0800

Hi Scott,

On 12/11/12 11:37 AM, Scott Ferguson wrote:
> On 12/11/12 10:25 AM, Danny Coward wrote:
>> Hi Scott,
>>
>> OK, I think I understand. So the idea is to allow implementations to
>> send messages in a batch in order to get a big performance gain for
>> applications that send a lot of messages in a short amount of time
>> and to allow an explicit way for developers to take advantage of
>> that, if the batching optimization is in the implementation.
>
> Exactly.
OK at least I understand it, which is progress !

>>
>> And I think with the flush() method, we would have allowed containers
>> who choose to do batching under the existing model without the extra
>> setBatching/setAutoflush() idea ?
>
> Only if we always require a flush. We could do that. That's the
> equivalent of auto-flush=false always, and since it's how
> BufferedOutputStream works, it's an existing programming model.
>
> If a developer forgets a flush, the message might never get sent.
>
> I'm a bit wary of that definition, because some implementations won't
> bother with buffering, and lazy programmers will forget the flush, but
> it will work anyway, and the spec will eventually revert to auto-flush
> after the lazy programmers complain about compatibility.
Yes I agree that requiring flush is not a good solution.
>>
>> I think that sort of approach already fits under the async model we
>> have: the async send operations allow implementations to make their
>> own choice about when to send the message after the async send has
>> been called. i.e.
>>
>> sendString/sendBytes - send the message now (no batching)
>> sendStringByFuture() - send the message when the container decides to
>> (possibly batching if it chooses to)
>
> That doesn't work, but the reason is a bit complicated (see below).
> (Secondarily, the *ByFuture is a high-overhead API, which doesn't work
> well in high-performance.)
>
> It doesn't work because the *ByFuture and *ByCompletion are single
> item queues. You can't batch or queue more than one item with those
> APIs. If you look at java.nio.channels.AsynchronousChannel, it says
>
> " Some channel implementations may support concurrent reading and
> writing, but may not allow more than one read and one write operation
> to be outstanding at any given time."
>
> Since it's only a single-item queue, the implementation can't batch
> the items -- there's only one item.
Right, but are our APIs are only expressed in terms of the data objects
not the channels. Does it really lock you into using AsynchronousChannel
with its one-write-at-a-time rule ?

>
> And it's a single-item queue because multiple-item queues require more
> API methods, like in BlockingQueue, and a longer spec definition to
> describe the queue behavior, e.g. what happens when the queue is full
> or even what "full" means.
I'm probably being really stupid, but can't an implementation use a
BlockingQueue under our APIs, and determine itself based on a knowledge
of its own implementation environment when to send a batch of messages ?

Its a bit tricky imposing a different development model on top of what
we have, especially because I'll bet there will be some implementations
that will not support batching. I have some ideas on a subtype of
RemoteEndpoint which might separate out the batching model better than
the flags and the flush(), but lets see.

I'm flagging this in the spec for v10 because the spec has not resolved
this yet.

- Danny

>
> -- Scott
>
>
>
>
>>
>>
>> - Danny
>>
>>
>>
>> On 11/29/12 12:11 PM, Scott Ferguson wrote:
>>> On 11/29/12 11:34 AM, Danny Coward wrote:
>>>> My apologies Scott, I must have missed your original request - I've
>>>> logged this as issue 63.
>>>
>>> Thanks.
>>>
>>>>
>>>> So auto flush true would require the implementation never keep
>>>> anything in a send buffer, false would allow it ?
>>>
>>> Not quite. It's more like auto-flush false means "I'm batching
>>> messages; don't bother sending if you don't have to." I don't think
>>> the wording should be "never", because of things like mux, or other
>>> server heuristics. It's more like "start the process of sending."
>>>
>>> setBatching(true) might be a better name, if that's clearer.
>>>
>>> When setBatching(false) [autoFlush=true] -- the default -- and an
>>> app calls sendString(), the message will be delivered (with possible
>>> buffering, delays, mux, optimizations, etc, depending on the
>>> implementation, but it will be delivered without further
>>> intervention from the app.)
>>>
>>> When setBatching(true) [autoFlush=false], and an app calls
>>> sendString(), the message might sit in the buffer forever until the
>>> application calls flush().
>>>
>>> sendPartialString would be unaffected by the flag; the WS
>>> implementation is free to do whatever it wants with partial messages.
>>>
>>> Basically, it's a hint: setBatching(true) [autoFlush=false] means
>>> "I'm batching a bunch of messages, so don't bother sending the data
>>> if you don't need to until I call flush."
>>>
>>> Does that make sense? I don't want to over-constrain implementations
>>> with autoFlush(true) either option. Maybe "batching" is the better
>>> name to avoid confusion. (But even batching=true doesn't require
>>> buffering. Implementations can still send fragments early if they
>>> want or even ignore batching=true.)
>>>>
>>>> It seems like a reasonable request - do you think the autoflush
>>>> property is a per-peer setting / per logical endpoint / per
>>>> container setting ? I'm wondering if typically developers will want
>>>> to set this once per application rather than keep setting it per
>>>> RemoteEndpoint.
>>>
>>> I think it's on the RemoteEndpoint, like setAutoCommit for JDBC.
>>> It's easy to set in @WebSocketOpen, and the application might want
>>> to start and stop batching mode while processing.
>>>
>>> -- Scott
>>>
>>>>
>>>> - Danny
>>>>
>>>> On 11/28/12 3:28 PM, Scott Ferguson wrote:
>>>>>
>>>>> I'd like a setAutoFlush() and flush() on RemoteEndpoint for high
>>>>> performance messaging. Defaults to true, which is the current
>>>>> behavior.
>>>>>
>>>>> The performance difference is on the order of 5-7 times as many
>>>>> messages in some early micro-benchmarks. It's a big improvement
>>>>> and puts us near the high-speed messaging like ZeroQ.
>>>>
>>>>
>>>> --
>>>> <http://www.oracle.com> *Danny Coward *
>>>> Java EE
>>>> Oracle Corporation
>>>>
>>>
>>
>>
>> --
>> <http://www.oracle.com> *Danny Coward *
>> Java EE
>> Oracle Corporation
>>
>

-- 
<http://www.oracle.com> 	*Danny Coward *
Java EE
Oracle Corporation