Scott Oaks wrote:
> On Mon, 2007-12-03 at 12:40, charlie hunt wrote:
>
>> Couple comments:
>>
>> 1.) My initial reaction is that I like the idea of hiding the logic of
>> manipulating the ByteBuffer's position, limit, etc within the
>> ProtocolParser such as those currently done by setNextStartPosition(),
>> getNextStartPosition(), setNextEndPosition(), getNextEndPosition(). It
>> looks like you're also suggesting to break apart parseBytes() into a
>> hasNextMessage() and nextMessage() along with setting the ByteBuffer in
>> the ProtocolParser rather than passing it to
>> ProtocolParser.parseBytes(), right? I do like the idea of setting the
>> ByteBuffer in the ProtocolParser better than passing it to the
>> ProtocolParser.parseBytes().
>>
>
> Yes, the suggestion all along has been to split parseBytes into hasNext
> and getNext (like Enumeration). Like you, I think that's easier the
> grizzly layer (or elsewhere) dealing with the getters/setters on the
> buffer.
>
>
>> 2.) Does startBuffer(ByteBuffer bb) suggesting to do any more than set
>> ProtocolParser.byteBuffer (I'm expect that an implementation of
>> ProtocolParser will be implemented to have a class member variable
>> called byteBuffer)? Then, does the first invocation of hasNextMessage()
>> or getNextMessage(), whichever one does the parsing assume that the
>> ProtocolParser ByteBuffer's position is set to the beginning of a
>> message and that the ProtocolParser ByteBuffer's limit is set to the end
>> of bytes to be parsed? (I'm making this assumption)
>>
>
> It could work that way (and that's certainly easier). startBuffer could
> do any other setup necessary, or for that matter it could pre-parse the
> entire buffer if it wanted to. There's one state thing it may need to
> take care of that I'll explain below.
>
>
>> 3.) Do you for see any issues with some edge scenarios such as:
>> a. ByteBuffer is full, i.e. ByteBuffer.limit == ByteBuffer.capacity, but
>> ProtocolParser.isExpectingMoreData() returns true. In other words, you
>> have reached the end of the ByteBuffer, but you need more data so that
>> nextMessage() can return a message. Somewhere there will need to be some
>> logic that will take the partial message bytes in the ByteBuffer and put
>> them into a new ByteBuffer, or there will be parsing logic that figures
>> out to span multiple ByteBuffers and be able to return a message.
>> b. Any special processing required for a situation where the ByteBuffer
>> is full, not yet been parsed, yet does not have enough a single full
>> message in it, (an edge case where message is too big to fit in a single
>> ByteBuffer.
>> * These latter two edge cases are why the getSizeNeeded() is on the
>> current API since some protocols will have a field that will say how
>> many bytes are in the message, (obviously some protocols do not have
>> such a field).
>>
>
> It's the second thing here that I see as a problem. If the buffer limit
> == capacity, then the buffer can just be cleared. Similarly, if there
> isn't enough room in the buffer between position and capacity for a full
> message, the buffer can be compacted. I was thinking that the compaction
> might happen in the readfilter code or elsewhere, but actually, I see
> that's not the case. If it were, then startBuffer would have to keep
> some state of any pre-parsed partial message, because its boundaries
> might change. But really, I think buffer compaction would be required by
> releaseBuffer instead -- it's supposed to set up the buffer so that more
> data can be read.
>
> But if the message doesn't fit altogether, then there is the issue of
> what to do. I think that would require some special handling; it's not
> clear to me that the best thing in all cases is to just allocate a
> bigger buffer. I was thinking of writing an ftp file or something with a
> really big payload -- in that case, you probably want to keep some state
> somewhere and keep a small buffer and write out the file contents (or
> whatever) as it comes in. In other cases, you may need to increase the
> buffer size.
>
> I think that the parser itself could create the bigger buffer if
> necessary and substitute it for the current thread (WorkerThread) buffer
> -- Harsha has code like that in the orb port already. But other ways of
> handling this might be better?
>
> We can put back the getSizeNeeded too -- nothing used that in the
> original code, nor in my proposal, which is why I took it out. Right
> now, the buffer size is determined by DefaultInitialPipeline's constant
> value. So would we want the default initial pipeline to somehow know if
> a protocol filter is installed and set the value on that? Or have the
> ParserProtocolFilter resize all the buffers? Neither seems really ideal
> to me, but I'm not sure how else you would do it.
>
> -Scott
>
>
>> I would be interested in hearing Harsha's comments on what work would be
>> necessary to migrate his current implementation to your proposed
>> interface. (Not that he has to do the work immediately, I'd just like
>> to hear his reaction)
>>
>>
Here is my input on Scott's proposal:
I agree with most of the proposal. The new parser interface seems like
very clean now, with getNextMessage() and hasNextMessage() api.
However, I do not see a big advantage of this design to the existing
interface. For example, when I call hasNextMessage(), how do we tell,
if there is a next message?
We really need to parse the byte buffer to know that. If we parse byte
buffer only to check for the next message, then we can as well create a
(say Message data type) message itself than first check for a message
then parse and then create a message. Perhaps, we may not even need
hasNextMessage() api in the parser interface.
Regarding removing the getNextStartPosition() and other byte buffer
position related api in the parser, there is possibility that the
filter might want to set the position of the bytebuffer before
returning control to the Grizzly selector handler for next cycle of
read. There are couple of scenarios where we wanted to grow the
underlying bytebuffer or compact the underlying byte buffer. (I see that
Charlie already pointed out this). Please remember that, the byte buffer
we handle here is from Grizzly. Grizzly allocates this byte buffer
before arriving at the filter and parser. That also makes me think, we
do not want to have startBuffer () and releaseBuffer() api in the
ProtocolParser interface.We may not want to store a reference to the
byte buffer in the parser. one might ask why? I would say, what's the
need? ( I can always get a reference to the underlying byte buffer from
the current worker thread.)
Regarding setting attributes in the Context class, for a parsed message-
I think that does not seem like a good idea. But can be left to the
user to decide if they want to. The context object that comes from the
Controller gets changes for every cycle of selection. (in
TCPSelectorHandler.) So, we might want to remove the set attribute after
its use.
What's missing is to have api (s), to clearly set the position of the
underlying bytebuffer for the following cases:
As far as my implementation for Corba is concerned, I felt the
getNextStartPosition() helped a lot in quickly arriving at where to
start parsing from? It's very tricky to know the position in the byte
buffer while parsing. The current position for parsing is totally
different from the position where we need to read from, in case, if more
data is expected and in case of partial message. So, I think, we are
good with the current getNextStartPosition().. etc.
Here are the basic scenarios where byte buffer position needs computation:
1) First time parsing and you get zero bytes in
the byte buffer. (unsuccessful read attempt from Selector)
2) Large data in the message.
parser.getNextMessage() gives
null since the data is partial to create at least one message during
parsing.
3) Mixed full and partial messages in the given
byte buffer.
4) Cases where the byte buffer needs to
reallocated and copy existing bytes prior to the next read (in read filter)
The way I do today is, to reallocate and
copy partial part of the bytebuffer. We could also compact the
bytebuffer, if that helps in saving the space in the given bytebuffer.
Copying is ugly but there may not be a choice here. Say, we parsed 2
messages and 3 message is partial and hence needs more data to read but
there is no space. Then we could compact the bytebuffer so that the
nextStartPosition() is set to zero and the 'l' length of bytes will be
moved (compacted) from their current position to zero. This will help
some times. Not always.. when the size needed is more than the remaining
space left in the bytebuffer.
In a nut shell, I will end up in refactoring the existing Corba
protocol filter and parser implementation based on what we arrive at :-)
-Harsha