Re: FI for microjava

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Tue, 16 Aug 2005 13:09:20 +0200

Thomas Skjølberg wrote:
>>> As for the classes/interfaces that does not exist in CLDC (Map,
>>> List, etc), obiously one needs to change the FI code or start
>>> deleting more FIME classes. I guess some of the J2SE interfaces may
>>> be implemented using CLDC Vector/HashTable/Enumerator but I don't
>>> know how smart that would be. So what is 'the plan'? What is the
>>> preformance gain of using J2SE Map/List etc over normal
>>> Hashtable/Enumerator/Vector in the J2SE FI?
>>
>>
>> I'm slightly confused with your question because FIME doesn't depend
>> on Java Collection Framework. FIME has been developed from FISE, and
>> what I've done for that is removing all the usages of Java
>> Collection Framework and replace with classic Vector and Hashtable.
>> Therefore, I have no plan regarding your point and the question about
>> Java SE is irrelevant to FIME.
>
>
> Ok, then I guess changing the J2SE code to make if J2ME compatible is
> out of the question (*)
>

We ruled this out initially because SE has different optimization
requirements to ME.

We thought it would be better to fork, optimize separately and then
revisit for any sharing.

Currently the FISE implements a base layer for three different types of
parser and serializer (SAX, StAX and DOM).

>> Regarding FIME StAX, all the supported features are the same of those
>> of FISE, so please refer to FISE StAX.
>
>
> (*) so that improvements in FISE will have to be manually propagated to
> FIME. I'd for one like to see support for StAX Location, however I
> unsure how compression complicates things.
>

Improvements to FISE do not necessarily translate to improvements in
FIME and vice versa because they are very different environments.
However, having said that it would be good for FI specific processing to
share a common API (see further for discussions on encoding of integers).

The StAX Location interface is very specific to characters i.e. it
naturally assumes octets of an XML document have been decoded to
characters. Character offset, column number and line number do not make
any sense when processing a binary encoding to produce XML infoset.

>> This workability is a basis for FIME's future, and we have to
>> research on enhancing FIME.
>
>
> Is there a forum for that?

Yes, this one :-)

> I'd like to work on integers instead of
> strings.

We have FI specific SAX APIs to do that but nothing yet for StAX :-( sorry.

The SAX API is easily extensible so the approach was to include a new
content handler for primitive types, see below:

https://fi.dev.java.net/source/browse/fi/FastInfoset/src/org/jvnet/fastinfoset/sax/PrimitiveTypeContentHandler.java?rev=1.4&view=auto&content-type=text/vnd.viewcvs-markup

Fast Infoset is used for encoding binary X3D documents. X3D makes use
of the encoding algorithms (both primitive and applicaiton defined). It
has been working very well for SAX.

We need a suitable API for StAX that works well for ME and SE. What
would you like to see? how would you envisage client code working for
serializing and parsing such data using StAX? Lets design something
together!

Could you send some the MPEG stuff that way i might better be able to
understand the requirements on how best to use FI.

> Also, I need to skip sub trees like a mad man.
>

Interesting, we have not implemented anything like that. I suspect that
FI can skip sub-trees quite efficiently because stuff is length prefixed
i.e. not necessary to decode octets that can be skipped.

> I'm a little surprised by all the 'xxxArray' classes in the util
> package. Why are there so many? When I first read about FI I thought
> that it would be all about String arrays.
>

Strings for most things, but some strings have different
representations. For example when using the SAX API strings for content
have a different representation for strings for attribute values, the
former is represented by char[] and the latter by the String object. For
optimization it is important to take advantage of this.

The only area where strings are not used is for the indexing of the
tuple of the three strings { prefix, local name, namespace name }. There
is a special qualified name array for this.

Fast Infoset specifies a number of different tables for different
classification of strings and qualified names. If you look at the code
for the ParserVocabulary class you can see the fields for all the tables:

     public final CharArrayArray restrictedAlphabet = new
CharArrayArray(ValueArray.DEFAULT_CAPACITY, 256);
     public final StringArray encodingAlgorithm = new
StringArray(ValueArray.DEFAULT_CAPACITY, 256);

     public final StringArray namespaceName;
     public final PrefixArray prefix;
     public final StringArray localName;
     public final StringArray otherNCName ;
     public final StringArray otherURI;
     public final StringArray attributeValue;
     public final CharArrayArray otherString;

     public final ContiguousCharArrayArray characterContentChunk;

     public final QualifiedNameArray elementName;
     public final QualifiedNameArray attributeName;

Most are associated with String. There is some special optimization for
strings associated with a prefix that may not be necessary for FIME.

For FIME, at least for the serializer it may not be necessary to encode
all information items. For the parser it may not be necessary to report
all information items. Mostly i am referring to stuff related to
unreported entities and notations, but it could also apply to comments
and processing instructions. If that is done then you can remove all the
tables prefixed by 'other'.

Paul.

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109

dev@fi.java.net

Re: FI for microjava - IaS