dev@fi.java.net

Re: Encoding algorithms

From: Santiago Pericas-Geertsen <Santiago.Pericasgeertsen_at_Sun.COM>
Date: Fri, 11 Feb 2005 11:44:34 -0500

On Feb 11, 2005, at 11:01 AM, Paul Sandoz wrote:

>>> For encoding the serializer has a choice to use bytes or an
>>> OutputStream. The choice to use which is not so easy to determine as
>>> it would be based given the object on if there is enough room in the
>>> byte array to encode the object. In this respect it may only make
>>> sense to have:
>>>
>>> void encodeToOutputStream(Object data, OutputStream s) throws
>>> IOException;
>> Will this work OK for those cases in which the serialized object
>> must be length prefixed?
>
> Yes, since the concrete implementation of the OutputStream can be
> specific to the serializer.

  Right, so you can use something to wrap the user's output stream.

>> Am I missing something?
>
> Not really.
>
> One solution is to extend ByteArrayOutputStream such that:
>
> - it is possible to get access to the underyling buffer; and
>
> - explicit resizes can be performed.
>
> A nested impl of the serializer can be used that has direct access to
> the serializer buffer.
>
>
> I have been thinking about changing the Encoder buffering
> implementation.
>
> One solution is to use two buffers: one for structure; and one for
> content. An int array would specify the ranges of each to write. The
> writing of data from each always occurs in the pattern of: structure,
> data, structure, data...
>
> Another solution is to use one buffer with traversal functionality.
> When encoding content you assume that the length of content is in the
> range such that it will take the maximum number of bytes to encoded
> the length. When the content is encoded you skip back to the
> appropriate location and encode the length. If there is fragmentation
> you register this in an int array for skipping.
>
> Currently i am double buffering the encoding of UTF-8 strings, which
> is a small performance hit.

  Perhaps it's easier to extend BufferedOutputStream with functionality
akin to BufferedInputStream where you can set a mark and then reset
back to it, except in this case you'd need the ability to "inject"
bytes during the reset process. All bytes must be buffered between the
mark() and the reset(). Something like,

        bos.mark()
        // Call encoder passing bos
        byte[] length = encode(bos.getByteLengthToMark());
        bos.reset(length); // inject length and flush buffer to underlying
stream

  Not sure if this satisfies all your requirements, but seems pretty
straightforward to do. Also, the same stream could be use to do
traditional buffering outside of the mark()/reset() case.

-- Santiago