Re: FI for microjava

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Fri, 23 Sep 2005 11:27:31 +0200

Hi Thomas,

Thomas Skjølberg wrote:
> Hi,
>
> this is actually a really delayed response, but I'm been busy.

I fully empathize!

> First of
> all, I got FIME working fine (great!), my problems very due to
> different implentations of the same interfaces => _use_ the Eclipse
> refactoring guys (or add a comment to the readme?).
>

Great. I have not had any time to work on FIME. If you have some
proposed improvements perhaps you could work with Ias, if he has any
time :-)

> Second, I've been composing a document which tries to address 'my'
> needs in terms of (binary) XML and some imagined API. I've sendt the
> same document to the mpeg workgroup on mpeg-21 Digital Item Streaming.
> Whatever comments - I need input, so feel free to tell me how
> incompetent I am;) - there are a lot of loose threads and I need to
> find a working solution (or at least API).
>

I would be happy to review the document if you send it to me. I will be
objective and not be biased towards Fast Infoset.

> I think that data navigation should be a natural part of any binary
> format, and that it in the XML case should be exposed thourgh an API,
> have you considered it (the database 'view' of XML)?

Do you mean random access into the document, or selected access to
certain parts.

A DOM representation provides random access to the infoset, but first
the DOM representation has to be instantiated.

XML databases do seem to work quite well and i am sure there are loads
of proprietary formats (with good reasons because it is not a high
requirement for these formats to be interoperable) for representing XML
infosets such that general querying of such data is efficient.

What we wanted to avoid with Fast Infoset was a 'memory representation'
of an XML infoset. We concentrated on a format to stream XML infosets.

For Fast Infoset we designed it such that it is possible to provide
'jump points' into the encoding i.e. selected access as chosen by the
serializer. This technique is not standardized but the capability is
there, so for example it could be possible if say an XML document
represented multiple pages to provide 'jump points' to each page.
Certain constraints have to be met in terms of the encoding and the XML
infoset for it to work and the current APIs would need to be extended.
In fact the concept of 'jump points' can be applied equally to XML
documents, although how you send the jump points with the XML document
is not quite so easy.

An alternative approach is to support simple and composable linear XPath
expressions operating on a stream of SAX events. This can be extremely
efficient. But it depends what type of processing you need to do on the
XML infoset.

> It is not only
> that 'some namespace cannot be understood', it is that 'the sets of
> understood namespaces may differ in parts of my application (or
> potentially in some 3rd party software)'. It is not inventing the
> wheel, but it will be usefull.
>

Not sure i understand. Can you give an example?

> I have not investigated BiM, the MPEG-7 XML compression, but I seen
> before writing this mail that some of the features exist, but i'll have
> to check it out more. If BiM definently is schema-based, there is no
> way I can use it just out of the box, because I need to compress XML
> without prior knowlegde of schemas.

BiM 1.0 is defintely schema-based. I think the BiM 1.x or 2.0 (i cannot
recall) has the ability to encode an XML infoset without a schema but as
i understand it is not very efficient. I think this feature is mainly
there to better encode instances of xsd:any.

> But:
> http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm#E11E20 .
> Also, without knowing, I imagine that schema-based solutions either use
> much memory and/or encode slow.
>

Actually schema-based solutions tend to be the fastests in terms of
encoding and decoding. The reason being is that special
encoders/decoders can be generated from the schema and the encodings
tend to be more compact than XML infoset-based solutions (especially for
small infosets), although using certain techniques of Fast Infoset it is
possible to close the gap.

A companion standard to FI , called X.694 (or Fast Schema as i like to
call it) provides functionality that is similar to BiM. See:

http://asn1.elibel.tm.fr/xml/#schema-mapping

The downside of such schema-based encodings is that there are not
self-describing or self-structuring, which is the reason why BiM defines
forwards and backwards support in terms of a higher-layer of the encoding.

If you could send me useful pointers to MPEG-21 (and your document) i
may be able to help you on your requirements based on the MPEG-21 use-cases.

> Anyways, having BiM in your onlince Parser performance / Compactness
> charts would be sweet.
>

That would be tricky since i do not have a BiM implementation available
to me.

Note that Fast Infoset can be applied effectively to X3D (already has
been applied effectively), SVG and Geography Markup Language documents
when there are encodings algorithms specified for the efficient encoding
of co-ordinate information represented as integers or real numbers. In
these cases i think Fast Infoset will perform well against a
schema-based encoding that use the same encodings algorithms.

Paul.

> Have a nice weekend, cheers
>
> Thomas
>
> On Tue, 16 Aug 2005 14:26:15 +0200, Changshin Lee <iasandcb_at_gmail.com>
> wrote:
>
> ....
>
>
> ------------------------------------------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe_at_fi.dev.java.net
> For additional commands, e-mail: dev-help_at_fi.dev.java.net

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109

dev@fi.java.net

Re: FI for microjava - IaS