users@grizzly.java.net

Re: Some questions about the ProtocolParser interface

From: Erik Svensson <erik.svensson_at_six.se>
Date: Fri, 08 Feb 2008 10:50:21 +0100

On 2/7/08 10:47 PM, "Scott Oaks" <Scott.Oaks_at_Sun.COM> wrote:

>
>
>> I'm a bit unclear about the contract implicit in the ProtocolParsers
>> interface. If we start from the start, with startBuffer() (:-)) it's almost
>> clear. It basically states that the ByteBuffer parameter is a byte source to
>> be parsed. It's also stated that startBuffer() 'should store the buffer and
>> its state so that subsquent calls to getNextMessage() will return distinct
>> messages'. I take this to mean that the contents of the source should be
>> appended or added to some ProtocolParser local storage (in my case appended
>> to a local ByteBuffer).
>
> There are different ways to do this based on the needs of your parser.
> In the simple case, you could just store a reference to the buffer, its
> initial position, limit, and so on so that you can keep track of the
> buffer as you go through the parsing.
>
> You can copy out the data, but that is probably inefficient. Also, it
> leaves the question of partial messages (below). So maybe we should make
> it clearer that by store the buffer, we really mean store a reference to
> the buffer.
>
> So a startBuffer() could do:
> bb.flip()
> savedBuffer = bb;
> origLimit = bb.limit();
> ... and perhaps copy out the data, or if the buffer has a backing
> store, just get that array
> for future use

Here's another thing I don't quite like and that's the ProtocolParser
flipping the bytebuffer for reading. IMHO the bytebuffer should be given to
to PP set for reading. The Filter is responsible for gathering bytes and
presenting them to the PP in a way that the PP can start reading.
In the same vein I feel that the PP should do a compact() before returning
the buffer (if it consumed anything, otherwise do a rewind()).
Might be nitpicking here but (once again) IMHO it's important to keep
responsibilities well definied. It only leads to confusion and chaos
especially for the maintainers coming in afterwards.
The ProtocolFilter flips the ByteBuffer before calling the PP and flips it
back afterwards.

>> However releaseBuffer() states that 'No more parsing
>> will be done... . Set up the buffer so that its position is the first byte
>> that was not part of a full message'. This indicates that the full content
>> of the source meantioned above should NOT be copied into local storage when
>> the startBuffer() method is invoked. releaseBuffer() also states
>> '[Returns] true if the parser has saved some state [in the source buffer]'
>> which contradicts the documentation from startBuffer().
>
> There are two cases here -- if there is no partial message left over in
> the buffer, then you can just clear out the buffer. If there is a
> partial message, then the position should be set to the beginning of the
> message (so that next time the parser is called, it will process that
> message), and the limit should be set to the original limit (so that
> additional data is read after that limit).

But why not just do a compact()?
Right now we're now halfway into a producer/consumer pattern. I'd prefer
that the PP does it's parsing, consuming the bytes and returning the
bytebuffer unflipped but compacted() (in the most common case).
If somebody else needs the raw bytes down the line then some other mechanism
should be implemented to furnish them with that.
 
>> There is also a
>> problem with this. Unless framework is ready to grow the source buffer
>> (which it is not according to some mails on the list) this might be a
>> problem. Some clients send large messages (I know of one protocol where the
>> messages can be arbitrarly large. In one case the message was 21 MB.). Since
>> it's bad to make assumptions about the frameworks behaviour (which might
>> change at the drop of a hat) the protocol parser has to keep state itself
>> (and store all incoming bytes). Ie, I think releaseBuffer() should work as
>> 'be done with this buffer'-message from the ParserProtcolFilter.
>
> Yes, it's true that there is no real buffer management that can grow the
> buffer yet, and it's true that's a problem. If you have a protocol that
> has 21MB messages, then perhaps the ParserProtocolFilter isn't the best
> solution for you -- or at least, it will probably require some work
> extending the ParserProtocolFilter class. The ORB team using grizzly has
> done that because some of their requirements don't necessarily fit this
> parsing framework.
>
> But for releaseBuffer to be completely done with the buffer, then the
> parser would have to copy data out of the bytebuffer into temporary
> storage somewhere, which is something we'd like to avoid -- data copying
> between buffers can get to be quite expensive. So leaving the partial
> data in the buffer and setting up that buffer to process additional
> messages is a way to avoid that copying.

While creating some sort of local storage might have a cost associated with
it, the act of actually copying more data into it can't really cost all that
much. System.arraycopy() is one of the most optimized routines in the vm.
But I do agree that gathering enough bytes for a complete message should not
really be the providence of the procotol parser.

> Another thing you might be able to do is compact the buffer in
> releaseBuffer -- if you find that there isn't enough room between the
> current position and the capacity for a message, compacting the buffer
> will make more room (at the expense of copying data).

I would say that compacting the buffer after reading a message should be
mandatory.
 
>> getNextMessage() is another puzzle. The first sentence 'Get the next
>> complete message from the buffer, which can then be processed by the next
>> filter in the protocol chain.' I understand as ' Get the next complete
>> message from the buffer, parse it to non-byte format and pass it on to be
>> processed by the next filter in the protocol chain'. However, the next
>> sentence makes no sense to me: ' Because not all filters will understand
>> protocol messages, this method should also set the position and limit of the
>> buffer at the start and end boundaries of the message'. Does this mean that
>> the ProtocolParser should NOT consume the bytes in the buffer? Isn't the
>> point of all this to turn a stream of bytes into a buffer? Further, if the
>> ProtocolParser has stored bytes locally then the buffer is full of data that
>> makes no sense without the locally stored data and might not even make sense
>> with that data as the message might not be complete.
>> In that vein, when you add filters to the protocol chain, can you assume
>> that they will invoked in the order added?
>
> The subsequent filters in the chain have two options: they can call
> context.getAttribute(ProtocolParser.MESSAGE) if they know a protocol
> parser is in use; then they can simply process the message. But some
> filters are dumb -- like the LogFilter. It doesn't know about messages
> or anything, so it will just get the buffer from the context and process
> the buffer. That's why you should also set the position/limit of the
> buffer to reflect the message boundaries -- so the LogFilter (or
> whatever) can deal with just the subset of data that you're dealing
> with.

To be frank here: I don't like adjusting position/limit to align with
message boundaries at all. In my mind that's changing the semantics of the
ByteBuffer class. I would rather wrap the byte buffer class in something
like a RawMessage class that contains the complete (raw) message as a byte
buffer gotten through the ByteBuffer.slice() operation and using fields in
that class as indicators for message start/end.

Once again rambling trying to work at the same time. :-)

cheers
Erik Svensson, SIX AB