Re: TODO for FastInfoset

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Fri, 14 Jan 2005 17:04:17 +0100

Further TODO....

- When the binary encoding is stable (there are a couple of changes to
the spec due to ISO BCs and need to be implemented) we should obtain a
set of test XML documents convert them to fast infoset documents and CVS
commit both sets. Then we can run tests:

   - over the checked in XML documents and ensure that they are the same
     as the previously converted fast infoset documents.

   - over the checked in fast infoset documents to check that the same
     infoset is produced as processing the XML documents.

- We should also create some rather 'contrived' XML documents to
excersize the index and lenght prefixing of the implementation to ensure
the encoding is implemented correctly for less common encoding cases.

Paul.

Paul Sandoz wrote:
> Hi,
>
> Below is a TODO list of what needs to be done to progress the Fast
> Infoset implementation. I think these should probably be added as issues
> so that we can easily discuss and expand on each.
>
> Paul.
>
>
> Generic
> -------
>
> - Complete the support for the unimplemented information items
> - Unparsed Entity information item
> - Notation information item
> - Document Type Declaration information item
> This should be fairly straight forward given most of the core encoding
> structure is in place.
>
>
> - Review vocabulary implementation
> It would be useful to review the current design for the support of
> vocabularies and see if we need to make improvements or changes.
> Currently i am toying with the idea of supporting vocabularies directly
> in a parser and have an external vocabulary instance per parser. This
> means only one array is required with an index that states at what
> position the external vocabulary ends. This will boost the performance
> of de-referencing indexes, since this will be a local operation with no
> method calls.
>
>
> - Initial vocabularies
> An initial vocabulary will occur at the head of the fast infoset
> document. Essentially a bunch of strings and indexes need to be decoded
> and added to tables before the first element information item is decoded.
>
>
> - Built-in restricted alphabets
> - "numeric" restricted alphabet
> - "date and time" restricted alphabet
>
>
> - Restricted alphabets
> Restricted alphabets whether built-in or defined in a vocabulary
> (initial or external) basically consist of a set of characters that are
> sequentially indexed with an integer value. The integer values are
> encoded instead of using a character encoding scheme (UTF-8, UTF-16) or
> an encoding algorithm. Thus whether the restricted alphabet is built-in
> or defined in a vocabulary the algorithm is the same.
>
>
> - Built-in encoding algorithms
> - "hexadecimal" encoding algorithm
> - "base64" encoding algorithm
> - "short" encoding algorithm
> - "int" encoding algorithm
> - "long" encoding algorithm
> - "boolean" encoding algorithm
> - "float" encoding algorithm
> - "double" encoding algorithm
> - "uuid" encoding algorithm
> - "cdata" encoding algorithm
>
>
> - Encoding algorithms
> Encoding algorithms specify a binary encoding to be used instead of
> the corresponding string representation. Such algorithms can be used for
> size and/or processing efficiency. A number of built-in encoding
> algorithms have been defined. It is possible to specify further
> algorithms by adding URIs to the vocabulary (initial or external). An
> encoding algorithm may be used for text content or an attribute value
> and is identified in the encoding as a small integer (1 to 256, there is
> a maximum of 256 encoding algorithms alowed per fast infoset document).
> A plugable registry of encoding algorithms needs to be defined so that
> it is possible to add then for use by the FI serializer/parser. An open
> question is how the parser/serializer API, e.g. SAX or StAX, can return
> such binary information through the API. Such binary information could
> be converted to a string by the algorithm but this would increase
> processing. For the built-in algorithms specific extensions could be
> defined. For the additional algorithms a generic method could be used
> returning an instance of the data as an object and the URI of the
> algorithm.
>
>
> SAX specific
> ------------
>
> - Support the interning of identifying strings using the SAX
> http://xml.org/sax/features/string-interning feature.
>
> - Proper SAX error reporting to the application.
>
> - Performance measurements and optimizations of SAX serializer
>
>
> StAX specific
> -------------
>
> - Support the interning of identifying strings
>
> - Performance measurements and optimizations of StAX parser and serializer
>
>
> JAXB
> ----
>
> Investiagte how FI can be plugged into JAXB. Given JAXBs schema
> knowledge it might be possible to:
>
> - speed up the process of serialization since JAXB may be able to retain
> local name and namespace association for faster look up of indexed
> qualified, and also because strings will be interned thus only requiring
> reference equality
>
> - external vocabularies may be used directly thus only integer values
> need be written for elements and attributes. This has the potential to
> speed up serialization even more since no lookup is required for indexing.
>
> - support for the built-in encoding algorithms and restricted alphabets
> that map to corresponding XSD data types. This will require that we
> design corresponding extensions to the appropriate XML API such that
> binary data can be passed or received.
>
>

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_fi.dev.java.net
For additional commands, e-mail: dev-help_at_fi.dev.java.net