Re: Protocol Parser proposal

From: Scott Oaks <Scott.Oaks_at_Sun.COM>
Date: Tue, 04 Dec 2007 10:38:02 -0500

On Mon, 2007-12-03 at 12:40, charlie hunt wrote:
> Couple comments:
>
> 1.) My initial reaction is that I like the idea of hiding the logic of
> manipulating the ByteBuffer's position, limit, etc within the
> ProtocolParser such as those currently done by setNextStartPosition(),
> getNextStartPosition(), setNextEndPosition(), getNextEndPosition(). It
> looks like you're also suggesting to break apart parseBytes() into a
> hasNextMessage() and nextMessage() along with setting the ByteBuffer in
> the ProtocolParser rather than passing it to
> ProtocolParser.parseBytes(), right? I do like the idea of setting the
> ByteBuffer in the ProtocolParser better than passing it to the
> ProtocolParser.parseBytes().

Yes, the suggestion all along has been to split parseBytes into hasNext
and getNext (like Enumeration). Like you, I think that's easier the
grizzly layer (or elsewhere) dealing with the getters/setters on the
buffer.

> 2.) Does startBuffer(ByteBuffer bb) suggesting to do any more than set
> ProtocolParser.byteBuffer (I'm expect that an implementation of
> ProtocolParser will be implemented to have a class member variable
> called byteBuffer)? Then, does the first invocation of hasNextMessage()
> or getNextMessage(), whichever one does the parsing assume that the
> ProtocolParser ByteBuffer's position is set to the beginning of a
> message and that the ProtocolParser ByteBuffer's limit is set to the end
> of bytes to be parsed? (I'm making this assumption)

It could work that way (and that's certainly easier). startBuffer could
do any other setup necessary, or for that matter it could pre-parse the
entire buffer if it wanted to. There's one state thing it may need to
take care of that I'll explain below.

> 3.) Do you for see any issues with some edge scenarios such as:
> a. ByteBuffer is full, i.e. ByteBuffer.limit == ByteBuffer.capacity, but
> ProtocolParser.isExpectingMoreData() returns true. In other words, you
> have reached the end of the ByteBuffer, but you need more data so that
> nextMessage() can return a message. Somewhere there will need to be some
> logic that will take the partial message bytes in the ByteBuffer and put
> them into a new ByteBuffer, or there will be parsing logic that figures
> out to span multiple ByteBuffers and be able to return a message.
> b. Any special processing required for a situation where the ByteBuffer
> is full, not yet been parsed, yet does not have enough a single full
> message in it, (an edge case where message is too big to fit in a single
> ByteBuffer.
> * These latter two edge cases are why the getSizeNeeded() is on the
> current API since some protocols will have a field that will say how
> many bytes are in the message, (obviously some protocols do not have
> such a field).

It's the second thing here that I see as a problem. If the buffer limit
== capacity, then the buffer can just be cleared. Similarly, if there
isn't enough room in the buffer between position and capacity for a full
message, the buffer can be compacted. I was thinking that the compaction
might happen in the readfilter code or elsewhere, but actually, I see
that's not the case. If it were, then startBuffer would have to keep
some state of any pre-parsed partial message, because its boundaries
might change. But really, I think buffer compaction would be required by
releaseBuffer instead -- it's supposed to set up the buffer so that more
data can be read.

But if the message doesn't fit altogether, then there is the issue of
what to do. I think that would require some special handling; it's not
clear to me that the best thing in all cases is to just allocate a
bigger buffer. I was thinking of writing an ftp file or something with a
really big payload -- in that case, you probably want to keep some state
somewhere and keep a small buffer and write out the file contents (or
whatever) as it comes in. In other cases, you may need to increase the
buffer size.

I think that the parser itself could create the bigger buffer if
necessary and substitute it for the current thread (WorkerThread) buffer
-- Harsha has code like that in the orb port already. But other ways of
handling this might be better?

We can put back the getSizeNeeded too -- nothing used that in the
original code, nor in my proposal, which is why I took it out. Right
now, the buffer size is determined by DefaultInitialPipeline's constant
value. So would we want the default initial pipeline to somehow know if
a protocol filter is installed and set the value on that? Or have the
ParserProtocolFilter resize all the buffers? Neither seems really ideal
to me, but I'm not sure how else you would do it.

-Scott

>
> I would be interested in hearing Harsha's comments on what work would be
> necessary to migrate his current implementation to your proposed
> interface. (Not that he has to do the work immediately, I'd just like
> to hear his reaction)
>
> hths,
>
> charlie ...
>
> Scott Oaks wrote:
> > I've looked at what Harsha is doing for the ORB and thought about
> > various ways this might be done. In the end, I though it was better not
> > to force protocol parsers to use a message data type (because then other
> > filters in the pipeline would have to know about that), but still to
> > allow them to use one if they want (which can simplify downstream
> > filters).
> >
> > So basically, the idea is this: the protocol parser can return an
> > arbitrary type from the hasNextMessage() method. However, that method
> > should still set up the buffer limit/position at the message boundaries.
> > Then a downstream filter has two options: it can use
> > context.getAttribute(ProtocolParser.MESSAGE) to see if there is a
> > defined message (and if so, it can use that message), or it can just use
> > the correctly established buffer to get the data. That means that things
> > like LogFilters can work without knowing anything about what's upstream:
> > they can just process the raw data. But it also means that smart filters
> > can avoid parsing the message twice. And finally, it means that an
> > implementor of a particular protocol doesn't have to override
> > ParserProtocolFilter (Harsha is doing that for the ORB, and given the
> > complexities of CORBA, that probably makes sense in that case, though I
> > believe that even the ORB might be able to use the standard
> > ParserProtocolFilter).
> >
> > The only downside is that parsers have to do two things: make a message
> > and set the buffer limits. Actually, there's no requirement that they
> > make a message: they can safely return null from the hasNextMessage()
> > method, and everything downstream can just operate on the buffer. So
> > that helps the simple case too.
> >
> > I've attached the modified ParserProtocolFilter and ProtocolPaser
> > interface for comments.
> >
> > -Scott
> >
> > ------------------------------------------------------------------------
> >
> > /*
> > * The contents of this file are subject to the terms
> > * of the Common Development and Distribution License
> > * (the License). You may not use this file except in
> > * compliance with the License.
> > *
> > * You can obtain a copy of the license at
> > * https://glassfish.dev.java.net/public/CDDLv1.0.html or
> > * glassfish/bootstrap/legal/CDDLv1.0.txt.
> > * See the License for the specific language governing
> > * permissions and limitations under the License.
> > *
> > * When distributing Covered Code, include this CDDL
> > * Header Notice in each file and include the License file
> > * at glassfish/bootstrap/legal/CDDLv1.0.txt.
> > * If applicable, add the following below the CDDL Header,
> > * with the fields enclosed by brackets [] replaced by
> > * you own identifying information:
> > * "Portions Copyrighted [year] [name of copyright owner]"
> > *
> > * Copyright 2007 Sun Microsystems, Inc. All rights reserved.
> > */
> >
> > package com.sun.grizzly.filter;
> >
> > import com.sun.grizzly.Context;
> > import com.sun.grizzly.ProtocolFilter;
> > import com.sun.grizzly.ProtocolParser;
> > import com.sun.grizzly.util.WorkerThread;
> > import java.io.IOException;
> > import java.nio.ByteBuffer;
> > import java.nio.channels.SelectionKey;
> >
> > import static com.sun.grizzly.Controller.Protocol.TCP;
> > import static com.sun.grizzly.Controller.Protocol.UDP;
> > import static com.sun.grizzly.Controller.Protocol;
> > import com.sun.grizzly.util.ThreadAttachment;
> >
> > /**
> > * Simple ProtocolFilter implementation which read the available bytes
> > * and delegate the decision of reading more bytes or not to a ProtocolParser.
> > * The ProtocolParser will decide if more bytes are required before continuing
> > * the invokation of the ProtocolChain.
> > *
> > *
> > *
> > * @author Jeanfrancois Arcand
> > */
> > public abstract class ParserProtocolFilter extends ReadFilter{
> >
> > public ParserProtocolFilter(){
> > }
> >
> > /**
> > * Read available bytes and delegate the processing of them to the next
> > * ProtocolFilter in the ProtocolChain.
> > * @return <tt>true</tt> if the next ProtocolFilter on the ProtocolChain
> > * need to be invoked.
> > */
> > @Override
> > public boolean execute(Context ctx) throws IOException {
> > ProtocolParser parser = (ProtocolParser)
> > ctx.getAttribute(ProtocolParser.PARSER);
> > if (parser == null) {
> > parser = newProtocolParser();
> > ctx.setAttribute(ProtocolParser.PARSER, parser);
> > }
> > if (!parser.hasMoreBytesToParse() || parser.isExpectingMoreData()) {
> > boolean continueExecution = super.execute(ctx);
> > WorkerThread workerThread = (WorkerThread)Thread.currentThread();
> > ByteBuffer byteBuffer = workerThread.getByteBuffer();
> > parser.startBuffer(byteBuffer);
> > if (!continueExecution){
> > return continueExecution;
> > }
> > }
> > if (!parser.hasNextMessage()) {
> > return false;
> > }
> >
> > return invokeProtocolParser(ctx, parser);
> > }
> >
> > /**
> > * Invoke the <code>ProtocolParser</code>. If more bytes are required,
> > * register the <code>SelectionKey</code> back to its associated
> > * <code>SelectorHandler</code>
> > * @param ctx the Context object.
> > * @return <tt>true</tt> if no more bytes are needed.
> > */
> > private boolean invokeProtocolParser(Context ctx, ProtocolParser parser) {
> >
> > if (parser == null){
> > throw new IllegalStateException("ProcotolParser cannot be null");
> > }
> >
> > boolean continueExecution = true;
> >
> > Object o = parser.getNextMessage();
> > ctx.setAttribute(ProtocolParser.MESSAGE, o);
> > return true;
> > }
> >
> > public boolean postExecute(Context context) throws IOException {
> > ProtocolParser parser = (ProtocolParser)
> > context.getAttribute(ProtocolParser.PARSER);
> > if (parser != null && parser.hasMoreBytesToParse()) {
> > // Need to say that we read successfully since bytes are left
> > context.setAttribute(ProtocolFilter.SUCCESSFUL_READ, Boolean.TRUE);
> > return true;
> > }
> >
> > if (parser.isExpectingMoreData()) {
> > parser.releaseBuffer();
> > WorkerThread workerThread = (WorkerThread)Thread.currentThread();
> > SelectionKey key = context.getSelectionKey();
> > // Detach the current Thread data.
> > ThreadAttachment threadAttachment = workerThread.detach(true);
> >
> > // Attach it to the SelectionKey so the it can be resumed latter.
> > key.attach(threadAttachment);
> >
> > // Register to get more bytes.
> > context.getSelectorHandler().register(key,SelectionKey.OP_READ);
> > return false;
> > }
> >
> > parser.releaseBuffer();
> > return super.postExecute(context);
> > }
> >
> > /**
> > * Return a new or cached ProtocolParser instance.
> > */
> > public abstract ProtocolParser newProtocolParser();
> >
> > }
> >
> > ------------------------------------------------------------------------
> >
> > /*
> > * The contents of this file are subject to the terms
> > * of the Common Development and Distribution License
> > * (the License). You may not use this file except in
> > * compliance with the License.
> > *
> > * You can obtain a copy of the license at
> > * https://glassfish.dev.java.net/public/CDDLv1.0.html or
> > * glassfish/bootstrap/legal/CDDLv1.0.txt.
> > * See the License for the specific language governing
> > * permissions and limitations under the License.
> > *
> > * When distributing Covered Code, include this CDDL
> > * Header Notice in each file and include the License file
> > * at glassfish/bootstrap/legal/CDDLv1.0.txt.
> > * If applicable, add the following below the CDDL Header,
> > * with the fields enclosed by brackets [] replaced by
> > * you own identifying information:
> > * "Portions Copyrighted [year] [name of copyright owner]"
> > *
> > * Copyright 2007 Sun Microsystems, Inc. All rights reserved.
> > */
> >
> > package com.sun.grizzly;
> >
> > import java.nio.ByteBuffer;
> >
> > /**
> > * An interface that knows how to parse bytes into a protocol data unit.
> > *
> > * @author Charlie Hunt
> > */
> > public interface ProtocolParser<T> {
> >
> > /**
> > * Is this ProtocolParser expecting more data ?
> > *
> > * This method is typically called after a call to <code>parseBytes()</code>
> > * to determine if the <code>ByteBuffer</code> which has been parsed
> > * contains a partial message
> > *
> > * @return - <code>true</code> if more bytes are needed to construct a
> > * message; <code>false</code>, if no
> > * additional bytes remain to be parsed into a <code>T</code>.
> > * Note that if no partial message exists, this method should
> > * return false.
> > */
> > public boolean isExpectingMoreData();
> >
> > /**
> > * Are there more bytes to be parsed in the <code>ByteBuffer</code> given
> > * to this ProtocolParser's <code>setBuffer</code> ?
> > *
> > * This method is typically called after a call to <code>parseBytes()</code>
> > * to determine if the <code>ByteBuffer</code> has more bytes which need to
> > * parsed into a message.
> > *
> > * @return <code>true</code> if there are more bytes to be parsed.
> > * Otherwise <code>false</code>.
> > */
> > public boolean hasMoreBytesToParse();
> >
> >
> > /**
> > * Get the next complete message from the buffer, which can then be
> > * processed by the next filter in the protocol chain. Because not all
> > * filters will understand protocol messages, this method should also
> > * set the position and limit of the buffer at the start and end
> > * boundaries of the message. Filters in the protocol chain can
> > * retrieve this message via context.getAttribute(MESSAGE)
> > *
> > * @return The next message in the buffer. If there isn't such a message,
> > * return <code>null.</code>
> > *
> > */
> > public T getNextMessage();
> >
> > /**
> > * Indicates whether the buffer has a complete message that can be
> > * returned from <code>getNextMessage</code>. Smart implementations of
> > * this will set up all the information so that an actual call to
> > * <code>getNextMessage</code> doesn't need to re-parse the data.
> > */
> > public boolean hasNextMessage();
> >
> > /**
> > * Set the buffer to be parsed. This method should store the buffer and
> > * its state so that subsequent calls to <code>getNextMessage</code>
> > * will return distinct messages, and the buffer can be restored after
> > * parsing when the <code>releaseBuffer</code> method is called.
> > */
> > public void startBuffer(ByteBuffer bb);
> >
> > /**
> > * No more parsing will be done on the buffer passed to
> > * <code>startBuffer.</code>
> > * Set up the buffer so that its position is the first byte that was
> > * not part of a full message, and its limit is the original limit of
> > * the buffer.
> > *
> > * @return -- true if the parser has saved some state (e.g. information
> > * data in the buffer that hasn't been returned in a full message);
> > * otherwise false. If this method returns true, the framework will
> > * make sure that the same parser is used to process the buffer after
> > * more data has been read.
> > */
> > public boolean releaseBuffer();
> >
> > /**
> > * Used to associate a particular parser with a connection
> > */
> > public static String PARSER = "ProtocolParser";
> >
> > /**
> > * Holds the message returned from <code>getNextMessage</code> for
> > * use by downstream filters
> > */
> > public static String MESSAGE = "ProtocolMessage";
> > }
> >
> >
> > ------------------------------------------------------------------------
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe_at_grizzly.dev.java.net
> > For additional commands, e-mail: dev-help_at_grizzly.dev.java.net
> >
>
> --
>
> Charlie Hunt
> Java Performance Engineer
>
> <http://java.sun.com/docs/performance/>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe_at_grizzly.dev.java.net
> For additional commands, e-mail: dev-help_at_grizzly.dev.java.net
>