dev@grizzly.java.net

Re: Protocol Parser proposal

From: Scott Oaks <Scott.Oaks_at_Sun.COM>
Date: Wed, 05 Dec 2007 13:47:45 -0500

> Here is my input on Scott's proposal:
>
> I agree with most of the proposal. The new parser interface seems
> like very clean now, with getNextMessage() and hasNextMessage() api.
> However, I do not see a big advantage of this design to the existing
> interface. For example, when I call hasNextMessage(), how do we tell,
> if there is a next message?
> We really need to parse the byte buffer to know that. If we parse byte
> buffer only to check for the next message, then we can as well create
> a (say Message data type) message itself than first check for a
> message then parse and then create a message. Perhaps, we may not even
> need hasNextMessage() api in the parser interface.

Not necessarily. I can envision a protocol where you can tell if there's
a full message without parsing -- e.g. if there are more bytes available
than a single message could possibly account for. But yes, in many
cases, hasNextMessage will need to parse the message to tell, and then
that message needs to be made available for the getNextMessage to return
(if it wants to).

Another reason we need both is that getNextMessage() can return null if
it doesn't want to behave in a message-oriented way (that is, if the
downstream filters will use only the bytebuffer to get data).

> Regarding removing the getNextStartPosition() and other byte buffer
> position related api in the parser, there is possibility that the
> filter might want to set the position of the bytebuffer before
> returning control to the Grizzly selector handler for next cycle of
> read.

In fact, before grizzly reads again, it's required that the parser set
the position of the bytebuffer so that grizzly knows where to start
putting the data. If all the data has been consumed, grizzly can put
data at the beginning of the buffer. If not, it has to put data after
the last byte that was consumed. So the parser must set those
boundaries.

But under what circumstance would the filter set the buffer positions?
When would a filter know details about the protocol being parsed, and
hence know what to set the buffer parameters to?

> There are couple of scenarios where we wanted to grow the
> underlying bytebuffer or compact the underlying byte buffer. (I see
> that Charlie already pointed out this). Please remember that, the byte
> buffer we handle here is from Grizzly. Grizzly allocates this byte
> buffer before arriving at the filter and parser. That also makes me
> think, we do not want to have startBuffer () and releaseBuffer() api
> in the ProtocolParser interface.We may not want to store a reference
> to the byte buffer in the parser. one might ask why? I would say,
> what's the need? ( I can always get a reference to the underlying byte
> buffer from the current worker thread.)

Yes, but it's easier to do it once at the beginning of a set of
messages. And the releaseBuffer() is needed in any case to let the
parser know that it needs to set the buffer to reflect all the data that
has been read (possibly resetting the buffer entirely).

> Regarding setting attributes in the Context class, for a parsed
> message- I think that does not seem like a good idea. But can be
> left to the user to decide if they want to. The context object that
> comes from the Controller gets changes for every cycle of selection.
> (in TCPSelectorHandler.) So, we might want to remove the set attribute
> after its use.

What is an alternative idea that lets the message be passed to
downstream filters?

> What's missing is to have api (s), to clearly set the position of the
> underlying bytebuffer for the following cases:
>
> As far as my implementation for Corba is concerned, I felt the
> getNextStartPosition() helped a lot in quickly arriving at where to
> start parsing from? It's very tricky to know the position in the byte
> buffer while parsing. The current position for parsing is totally
> different from the position where we need to read from, in case, if
> more data is expected and in case of partial message. So, I think, we
> are good with the current getNextStartPosition().. etc.
> Here are the basic scenarios where byte buffer position needs
> computation:
> 1) First time parsing and you get zero bytes
> in the byte buffer. (unsuccessful read attempt from Selector)
> 2) Large data in the message.
> parser.getNextMessage()
> gives null since the data is partial to create at least one message
> during parsing.
> 3) Mixed full and partial messages in the
> given byte buffer.
> 4) Cases where the byte buffer needs to
> reallocated and copy existing bytes prior to the next read (in read
> filter)
> The way I do today is, to reallocate and
> copy partial part of the bytebuffer. We could also compact the
> bytebuffer, if that helps in saving the space in the given
> bytebuffer. Copying is ugly but there may not be a choice here. Say,
> we parsed 2 messages and 3 message is partial and hence needs more
> data to read but there is no space. Then we could compact the
> bytebuffer so that the nextStartPosition() is set to zero and the 'l'
> length of bytes will be moved (compacted) from their current position
> to zero. This will help some times. Not always.. when the size needed
> is more than the remaining space left in the bytebuffer.

Ah, I think this answers one of my questions about. So basically, there
seem to be two ways to handle the case where the grizzly is about to
read more data into the buffer:

1) The releaseBuffer() call is made to let the protocol parser know that
the buffer needs to reflect what's been consumed

2) The ParserProtocolFilter calls getNextStartPosition() so that it can
set the buffer to reflect what's been consumed (and also needs
getNextEndPosition() to set the limit)

I guess I'm not particularly wedded to doing it either way if my
understanding of the 2nd point is correct.

-Scott

>
> In a nut shell, I will end up in refactoring the existing Corba
> protocol filter and parser implementation based on what we arrive at
> :-)
>
> -Harsha