Re: Encoding algorithms

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Fri, 11 Feb 2005 17:01:25 +0100

Santiago Pericas-Geertsen wrote:
>
> On Feb 11, 2005, at 6:12 AM, Paul Sandoz wrote:
>
>> Santiago Pericas-Geertsen wrote:
>>
>>>> I was envisaging that encoding algorithms could be registered with
>>>> multuple parsers instances (and types of parser) (so can external
>>>> and initial vocabularies).
>>>
>>> Ah, I see. Personally, I think that would be a bad idea since it
>>> will require synchronization and will effectively create a "chicane"
>>> in your system.
>>>
>>
>> OK. Pondering some more.... perhaps the parser/serializer should
>> maintain a registration of factories for each algorithm and do:
>>
>> algorithmFactory.getInstance().decodeFrom...
>> algorithmFactory.getInstance().encodeTo...
>
>
> What's wrong with simply registering encoders on the parser/serializer
> instance? In other words, why do we need the extra indirection via
> factories?
>

OK, so then the 'entity' responsible for obtaining an encoding algorithm
instance is responsible for ensuring that a suitable instance is passed
to the parser/serializer such that concurrency issues do not arise for
multiple parsers/serializers running concurrently.

>> For encoding the serializer has a choice to use bytes or an
>> OutputStream. The choice to use which is not so easy to determine as
>> it would be based given the object on if there is enough room in the
>> byte array to encode the object. In this respect it may only make
>> sense to have:
>>
>> void encodeToOutputStream(Object data, OutputStream s) throws
>> IOException;
>
>
> Will this work OK for those cases in which the serialized object must
> be length prefixed?

Yes, since the concrete implementation of the OutputStream can be
specific to the serializer.

> It seems that in many (most?) cases the serializer
> will have to use a ByteArrayOutputStream to do the serialization
> correctly.

Yes, sort of, see below.

> Am I missing something?

Not really.

One solution is to extend ByteArrayOutputStream such that:

- it is possible to get access to the underyling buffer; and

- explicit resizes can be performed.

A nested impl of the serializer can be used that has direct access to
the serializer buffer.

I have been thinking about changing the Encoder buffering implementation.

One solution is to use two buffers: one for structure; and one for
content. An int array would specify the ranges of each to write. The
writing of data from each always occurs in the pattern of: structure,
data, structure, data...

Another solution is to use one buffer with traversal functionality. When
encoding content you assume that the length of content is in the range
such that it will take the maximum number of bytes to encoded the
length. When the content is encoded you skip back to the appropriate
location and encode the length. If there is fragmentation you register
this in an int array for skipping.

Currently i am double buffering the encoding of UTF-8 strings, which is
a small performance hit.

Paul.

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109