dev@fi.java.net

Support for encoding algorithms

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Mon, 24 Jan 2005 12:16:48 +0100

Hi,

We need to decide on the architecture for the support of encoding
algorithms (which is i think is the last major piece to put together
before we should take a look back and evaluate the overall archicture
for the support of vocabularies and algorithms).


A little overview first:

There are two forms of encoding algorithms:

1) The built-in algorithms that provide for general horizontal support
for the encoding of arrays of octets, integers and floating point data.

2) Application defined algorithms that provide for domain specific
support e.g. the optimal binary encoding of 2D or 3D data. Application
defined algorithms are specified using a URI. A set of URIs will be
added to the encoding algorithm vocabulary table.

An encoding algorithm may be used for a chunk of character information
items (text content) or the [normalized value] property of an attribute
information item (attribute value).

Each algorithm will have a small integer assigned to it. The integers
for the built-in algorithms are fixed, and the integers for the
application defined algorithms are assigned according to the URIs in the
  encoding algorithm vocabulary table. This integer is used to 'tag' the
octets produced from a encoding algorithm such that a decoder can assign
an appopriate decoder to process the octets based on the tag.



There are three aspects we need to think about:

1) The basic design of the encoders/decoders.

I think the best approach is to design the built-in algorithms in the
same manner as the non-built in algorithms specified using a URI.

An open question is whether it should be left up to the application to
apply the appropriate encoding algorithm or should this should be
handled by the parser/serializer given a set of registered algorithms
i.e. the parser/serializer is only responsible for decoding/encoding the
octets and associated them with the URI or tag of the URI.

Potentially both could be allowed according to a suitable
parser/serializer property or feature.

It may be advantageous to also ensure that instances of the algorithms
implement the java.lang.CharSequence [1] interface, thus allowing easier
integration for string-based processing if required (however, there will
be a performance impact). This may help in the case where an application
does not support the data returned by the encoding algorithm and only
supports characters.


2) Minimize buffer overhead.

We need to decide on how best to pass and produce bytes for the encoder
and decoder implementation to minimize or avoid buffer copying. It
remains to be seen whether this is only possible with tight integration
of the algorithms to the parser/serializer.

Some buffering may be unavoidable for encoding since the length of the
octets (produced from the algorithm) to encode needs to be encoded
before the octets.


3) How data is reported to the application using the XML API.

The SAX API will require extending. Two new interfaces will be required
to support text content and attribute values. The SAX interface is
relatively easy to extend by way of the XMLReader.setFeature [2] and
XMLReader.setProperty [3] methods.

The StAX API will also require extended. Since this is a newer API i am
not as familiar with the possible extention mechanisms. We probably need
to extend the stream reader [4] and writer interfaces [5] for the cursor
API and add new events for the event API [6].

Or course we will have to choose appropriate/suitable package names for
these extensions :-)

Paul.

[1] http://java.sun.com/j2se/1.4.2/docs/api/java/lang/CharSequence.html
[2]
http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/XMLReader.html#setFeature(java.lang.String,%20boolean)
[3]
http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/XMLReader.html#setProperty(java.lang.String,%20java.lang.Object)
[4]
http://java.sun.com/webservices/docs/1.5/api/javax/xml/stream/XMLStreamReader.html
[5]
http://java.sun.com/webservices/docs/1.5/api/javax/xml/stream/XMLStreamWriter.html
[6]
http://java.sun.com/webservices/docs/1.5/api/javax/xml/stream/events/package-summary.html

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_fi.dev.java.net
For additional commands, e-mail: dev-help_at_fi.dev.java.net