Re: Some questions about the ProtocolParser interface

From: Scott Oaks <Scott.Oaks_at_Sun.COM>
Date: Thu, 07 Feb 2008 16:47:31 -0500

> I'm a bit unclear about the contract implicit in the ProtocolParsers
> interface. If we start from the start, with startBuffer() (:-)) it's almost
> clear. It basically states that the ByteBuffer parameter is a byte source to
> be parsed. It's also stated that startBuffer() 'should store the buffer and
> its state so that subsquent calls to getNextMessage() will return distinct
> messages'. I take this to mean that the contents of the source should be
> appended or added to some ProtocolParser local storage (in my case appended
> to a local ByteBuffer).

There are different ways to do this based on the needs of your parser.
In the simple case, you could just store a reference to the buffer, its
initial position, limit, and so on so that you can keep track of the
buffer as you go through the parsing.

You can copy out the data, but that is probably inefficient. Also, it
leaves the question of partial messages (below). So maybe we should make
it clearer that by store the buffer, we really mean store a reference to
the buffer.

So a startBuffer() could do:
    bb.flip()
    savedBuffer = bb;
    origLimit = bb.limit();
    ... and perhaps copy out the data, or if the buffer has a backing
store, just get that array
    for future use

> However releaseBuffer() states that 'No more parsing
> will be done... . Set up the buffer so that its position is the first byte
> that was not part of a full message'. This indicates that the full content
> of the source meantioned above should NOT be copied into local storage when
> the startBuffer() method is invoked. releaseBuffer() also states
> '[Returns] true if the parser has saved some state [in the source buffer]'
> which contradicts the documentation from startBuffer().

There are two cases here -- if there is no partial message left over in
the buffer, then you can just clear out the buffer. If there is a
partial message, then the position should be set to the beginning of the
message (so that next time the parser is called, it will process that
message), and the limit should be set to the original limit (so that
additional data is read after that limit).

> There is also a
> problem with this. Unless framework is ready to grow the source buffer
> (which it is not according to some mails on the list) this might be a
> problem. Some clients send large messages (I know of one protocol where the
> messages can be arbitrarly large. In one case the message was 21 MB.). Since
> it's bad to make assumptions about the frameworks behaviour (which might
> change at the drop of a hat) the protocol parser has to keep state itself
> (and store all incoming bytes). Ie, I think releaseBuffer() should work as
> 'be done with this buffer'-message from the ParserProtcolFilter.

Yes, it's true that there is no real buffer management that can grow the
buffer yet, and it's true that's a problem. If you have a protocol that
has 21MB messages, then perhaps the ParserProtocolFilter isn't the best
solution for you -- or at least, it will probably require some work
extending the ParserProtocolFilter class. The ORB team using grizzly has
done that because some of their requirements don't necessarily fit this
parsing framework.

But for releaseBuffer to be completely done with the buffer, then the
parser would have to copy data out of the bytebuffer into temporary
storage somewhere, which is something we'd like to avoid -- data copying
between buffers can get to be quite expensive. So leaving the partial
data in the buffer and setting up that buffer to process additional
messages is a way to avoid that copying.

Another thing you might be able to do is compact the buffer in
releaseBuffer -- if you find that there isn't enough room between the
current position and the capacity for a message, compacting the buffer
will make more room (at the expense of copying data).

> getNextMessage() is another puzzle. The first sentence 'Get the next
> complete message from the buffer, which can then be processed by the next
> filter in the protocol chain.' I understand as ' Get the next complete
> message from the buffer, parse it to non-byte format and pass it on to be
> processed by the next filter in the protocol chain'. However, the next
> sentence makes no sense to me: ' Because not all filters will understand
> protocol messages, this method should also set the position and limit of the
> buffer at the start and end boundaries of the message'. Does this mean that
> the ProtocolParser should NOT consume the bytes in the buffer? Isn't the
> point of all this to turn a stream of bytes into a buffer? Further, if the
> ProtocolParser has stored bytes locally then the buffer is full of data that
> makes no sense without the locally stored data and might not even make sense
> with that data as the message might not be complete.
> In that vein, when you add filters to the protocol chain, can you assume
> that they will invoked in the order added?

You can assume that filters are invoked in order.

The getNextMessage() returns a message object; what that object is is
completely up to you (which is what I meant by non-byte format, but
that's quite a bad choice of words on my part!). That message is passed
down the filter chain.

The subsequent filters in the chain have two options: they can call
context.getAttribute(ProtocolParser.MESSAGE) if they know a protocol
parser is in use; then they can simply process the message. But some
filters are dumb -- like the LogFilter. It doesn't know about messages
or anything, so it will just get the buffer from the context and process
the buffer. That's why you should also set the position/limit of the
buffer to reflect the message boundaries -- so the LogFilter (or
whatever) can deal with just the subset of data that you're dealing
with.

On the other hand, while that was working for me with the LogFilter when
I proposed these changes, it isn't anymore. I'll have to go debug that,
but that at least is the intent of why you might want to set up the
buffer bounds to reflect the message bounds.

> This is a bit rambling as I try to work while writing it but I would really
> like to see a bit of idea beyond the tutorial on how this is supposed to
> work.
>
> Also, the javadocs is bit old.In the comments to hasMoreBytesToParse() the
> method 'parseBytes()' is mentioned. parseBytes() cannot be found however, so
> I'm guessing getNextMessage() is what's meant.
> There's a bunch of others.

I'll work on updated both the javadoc and the tutorial code.

-Scott