Re: FI parser buffer

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Mon, 12 Sep 2005 13:26:38 +0200

Brian Pontarelli wrote:
>
>>
>> But since you are using a stateful protocol you can get further
>> advantage from FI if you share the vocbulary over multiple messages to
>> reduce message size and increase serializing and parsing performance.
>>
>> Note that creating a new parser/serializer per message is expensive,
>> we have found it is best to share a parser/serializer per thread.
>>
>> I am making certain guesses as to your system since i do not know what
>> requirements you have. If you could explain a bit more about your
>> system (sending a private email is OK if you do not want to discuss
>> publically) i may be able to help you get further optimization.
>
>
> This will all be open and available shortly, so I don't mind discussing
> details here. The re-use of the serializers and handlers seems like a
> great idea and since I have transactional context during a session I
> could add those Objects to that class.

Yes. There are two forms of reuse:

1) The parsers/serializers; and
2) The vocabularies

By default when you reuse a parser and serializer the vocabularies are
internal to these and are reset each time a parse or serialize is
performed. To fully avail of 2) it is necessary to pass a vocabulary
class to the parser and serializer. I will try to send out some details
this week.

2 is only an advantage when messages are small'ish and the vocabulary
(tags etc) can take up a reasonable size of the message e.g. > 10% of
the overall message size.

Do you have an estimate for the average size of your messages?

> I envision it should be really
> straight forward since my protocol uses request/response paradigm such
> the client won't won't write a second request to the server until the
> server has completely returned the response from the first request.
>
> Looks like this:
>
> client server
> | ----------------->|
> | <-----------------|
> | ----------------->|
> | <-----------------|
> | ----------------->|
> | <-----------------|
>

OK.

> Eventually I'd like to add in talk back from the server, but I'm
> actually thinking of using an additional connection for that, so this
> paradigm would still be used, just with the request originating from the
> server.
>

Using something like BEEP may help you here, although i do not know if
the current Java-based implementations use non-blocking features like NIO.

BEEP can mulitplex over one socket or use multiple connections.

>> Private methods will be inlined, if they get called enough, because
>> they are not virtual. Thus the Decoder.read method can be inlined.
>> Note that a lot of the java.io.InputStream method implements are
>> implemented as syncrhonized adding a further cost for just reading one
>> byte.
>
>
> That's good to know. Do you know of any resources for more information
> about the HotSpot compilation process?

See here:

http://java.sun.com/products/hotspot/
http://java.sun.com/docs/hotspot/PerformanceFAQ.html

> In terms of InputStream, I've
> overridden nearly everything in there for that reason. I can get away
> with a single volatile and a blocking queue instead. Only if the
> volatile is null or empty (ByteBuffer) do I access the blocking queue.
> Reduces the overhead quite a bit. Bulk reads are actually the best (as
> you've mentioned) since they only access the volatile ByteBuffer once.
>

OK. Your impl looks quite efficient.

>> Profiling showed that depending on InputStream.read was an issue. I
>> got quite a performance boost when a merged buffering and parsing into
>> the Decoder. It is quite a common technique for improving performance.
>>
>> The FI parser will tend to read a couple of individual bytes for
>> structure and then read a sequence of bytes for content (tags or text
>> content or attribute values). There is at least one read call per
>> element, attribute, tag, text content, and attribute value. That can
>> add up to a lot.
>
>
> Good to know. I'll make sure I account for that in the future when I'm
> handling IO that ever can block or synchronize. Do you know if the same
> holds true for other non-synchronized, non-blocking methods? I would
> imagine not as much of a performance drain except the obvious need to
> construct and tear down the stack frame for the method invocation.
>

Calling non-synchronized methods will always be faster. Still such
methods cannot be inlined because there could be different
implementations of InputStream passed to the parser.

As i understand there will be a feature in a future release of the
hotspot compiler that does speculative locking to reduce the overhead of
'synchronized' when there is no lock contention.

>>> I read in using NIO and then buffer that data into my own InputStream
>>> and allow reads from that in blocks or byte by byte. I'm not
>>> convinced that lots of method invocations to read byte by byte really
>>> slow it down that much.
>>
>>
>>
>> Try measuring the performance :-) using ByteArrayInputStream.read and
>> a private method to read.
>
>
> I guess my thought was that if all I'm doing is reading from a volatile
> variable in a method, if I call that method lots of times, how much
> worse is that than calling a single method that just reads a chunk from
> the volatile variable. I've gotten away from most of the InputStream
> semantics for synchronization and such, so I guess I'll have to profile
> this and see what the volatile variable case looks like.
>

But when a read is performed synchronization is still required around
this volatile variable. If i understand correctly you are reducing the
scope of of the synchronized block? which for your scenario is very
important as it is kind of like a producer consumer scenario and
reducing overlap is important to reduce the wait time.

>>> My only performance concern is my use of volatile variable for the
>>> current buffer I read from NIO.
>>
>>
>>
>> But i presume you are also using FI for performance reasons?
>
>
> Oh yeah! Base 64 encoding sucks real bad. 500 times decrease in
> performance for encoding/decoding a 14k image.

That is an astonishing number! I suspect the b64 impl is not as
efficient as it could be.

> The protocol originally
> used JAXB and XML, but I did some tests and since the server had to
> encode the image and the client had to decode, the performance was
> awful. So, I switched to FI so I don't ever have to encode anything! I
> love it!
>

Cool.

>> Are you using the non-blocking features of NIO? When using such
>> streaming functionality one thing we need to watch out for is the read
>> being blocked while waiting for data that is not part of the current
>> document.
>>
>> For example there could be the edge case where the Decoder.read will
>> be called for the termination of the document and this results in a
>> complete read of the buffer, which could block. Note that the
>> Decoder.read does not check for the case when zero bytes. The
>> semantics of InputStream.read say that at least one byte must be
>> returned.
>>
>> In general it seems that it should be possible to retain a reasonable
>> buffer size for efficient parsing assuming that the InputStream.read
>> returns partial data. i.e. it should not be necessary to modify the
>> buffer size.
>
>
> Yeah, I'm a HUGE advocate of non-blocking I/O. I don't mind other APIs
> that block since the NIO selector I've implemented won't ever block. I
> push the bytes that were read from the Channel into my custom NIO
> InputStream. Other threads that are working on that stream can be
> blocked waiting from more data, but as long as they can handle as little
> as 1 byte reads and also handle the End of Stream read (i.e. -1), they
> should be fine. I can send you snippets of code to look over and see if
> you can find any performance sinks. Let me know.
>

Right your impl should scale very well in multi-threaded environments
(especially on Sun's Niagra chip :-) ). The conventional approach of
having one thread per request does not scale.

>
>> Errors in the encoding. How do you recover from an error in the
>> stream? It is tricky or impossible thus further requests or responses
>> that have been written may be lost. This is especially important when
>> proxies are involved.
>>
>> For a private protocol i reckon it is OK, although i would tend to
>> avoid that approach myself for reasons that a good transport protocol
>> offers other advantages as well.
>>
>> For a public protocol it is not the common practice, although an
>> exception to this is Jabber which AFAIK creates an open stream for a
>> continuous XML document. I do not know if the design decision behind
>> Jabber was in response to lack of keep alive support in HTTP servers
>> or for other reasons.
>
>
> Great question because this was a huge problem for me. I assume that if
> FI encounters a bad encoding, it will throw an exception.

Yes it will. Although it has not been fully tested for robustness i am
reasonable confident that the parser is quite robust because of the
nature of its implementation.

> I think in
> most cases this is true. In this case, I catch the exception and tear
> down the entire conversation with the client. I assume that the client
> will understand it when the server says, "dude, you just sent some bad
> stuff my way and all I can do is cut you off without any response."
>

Since you also do not 'pipeline', i.e. the client sends multiple
requests before processing the response to the first request, this
should be easier to manage.

> However, even if it is not the case and FI for whatever reason can't
> determine that what it has already is bad, there are a few cases.
>
> 1. The client sends more bytes across. In this case I keep feeding FI
> until it either throws an exception, or blocks again waiting for more
> bytes.
> 2. The client doesn't send anything more. Since my NIO selector is truly
> non-blocking, in this case, I set a timeout that says, I've read in some
> stuff, the client hasn't sent anymore in X seconds and the parser
> couldn't figure out how to handle what the client sent, therefore I
> figure that the message is corrupt and tear down the conversation with
> the client.
>
> So, no matter what the case, I eventually stop talking to the client and
> the client will see this as an end of stream when they go to read or an
> end of stream when they go to write some more. Both cases the client can
> handle however they want.
>

OK.

> The only case I'm not certain that this will cover is the case of
> proxies. I think that the proxy will still see the end of stream and
> handle it fine. This all really comes down to the request/response
> paradigm I'm using. If the protocol was truly bi-directional, then yeah,
> all the other messages would be lost. Luckily, I can guarantee that no
> more messages have been sent yet since the server hasn't finished
> processing the current one and hasn't yet sent a response.
>

Traditionally proxies, well HTTP proxies, tend not to understand the
semantics of content that is being transported. If you had a specific
proxy for your protocol then i think it would work.

However in the long you may not be sure how your protocol will evolve
e.g. HTTP allows for pipelining when using keep-alive. What happens of
the message pattern changes? e.g. to an asynchronous pattern?

Also proxies tend to work with 'meta-data' e.g. where does this message
go?, which is independent of the semantics of the content. To some
extent this is changing with the concept of XML routers since it is
possible for a generic proxy to look into the message, for example
operating on the result of XPath expressions to make routing decisions.

> Again, let me know if you see holes in this. Also, if you would like to
> participate on the RFC for the protocol specification (nothing through a
> standards body, we are just going to publish it to our website and open
> up an email address or mailing list for comments), let me know and I'll
> email you when it comes out (hopefully in the next week or two).
>

Interesting. Send a link to this list when it is ready and i will have a
read.

Paul.

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109