dev@fi.java.net

Re: Encoding algorithms

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Fri, 11 Feb 2005 18:09:14 +0100

Santiago Pericas-Geertsen wrote:
>> I have been thinking about changing the Encoder buffering implementation.
>>
>> One solution is to use two buffers: one for structure; and one for
>> content. An int array would specify the ranges of each to write. The
>> writing of data from each always occurs in the pattern of: structure,
>> data, structure, data...
>>
>> Another solution is to use one buffer with traversal functionality.
>> When encoding content you assume that the length of content is in the
>> range such that it will take the maximum number of bytes to encoded
>> the length. When the content is encoded you skip back to the
>> appropriate location and encode the length. If there is fragmentation
>> you register this in an int array for skipping.
>>
>> Currently i am double buffering the encoding of UTF-8 strings, which
>> is a small performance hit.
>
>
> Perhaps it's easier to extend BufferedOutputStream with functionality
> akin to BufferedInputStream where you can set a mark and then reset back
> to it, except in this case you'd need the ability to "inject" bytes
> during the reset process. All bytes must be buffered between the mark()
> and the reset(). Something like,
>
> bos.mark()
> // Call encoder passing bos
> byte[] length = encode(bos.getByteLengthToMark());
> bos.reset(length); // inject length and flush buffer to
> underlying stream
>
> Not sure if this satisfies all your requirements, but seems pretty
> straightforward to do. Also, the same stream could be use to do
> traditional buffering outside of the mark()/reset() case.
>

That concept should mostly work, except there needs to be a skip if any
fragmentation occurs between the encoding of the length and of the
content since it is good to avoid lots of flushes for small content.

I already have some mark/reset functionality in the Encoder (it acts
like a BufferedOutputStream) because this is needed when encoding
literal EIIs with namespaces with SAX (the namespace events come before
the element event and only when the latter occurs is it known if
attributes are present).

Paul.

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109