dev@fi.java.net

Re: TODO for FastInfoset

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Thu, 13 Jan 2005 23:24:06 +0100

Joe Wang wrote:
> Paul Sandoz wrote:
>
>> Paul Sandoz wrote:
>>
>>
>>> Generic
>>> -------
>>>
>>>
>>
>>
>> Two further generic items are:
>>
>> - Duplicate attributes
>>
>> To comply fully with well-formed XML it is necessary to ensure that
>> there are no duplicate attributes among the [attributes] property of
>> an EII.
>>
>> For efficient checking an integer array the size of the attribute name
>> table is required and an integer that represents the current parser
>> state. Each time an EII with AIIs occurs the integer is incremented. For
>> each AII the index that corresponds to the AII is used to obtain the
>> value in the integer array. If this value is equal to the integer then
>> the attribute has occured before. If this value is not equal to the
>> integer then the attribute has not occured before and the value in the
>> integer array is set to the integer.
>>
>>
>>
> I noticed the attribute list array. Any reason why we were not using map
> for holding Attributes?
>

Because it is slow and will contribute to garbage collection.

Fast Infoset indexes attributes and we should take advantage of this
feature for performance.

An attribute should be checked to see if it is a duplicate before it is
added to the list of attributes.

IIRC some fast XML parsers do not bother to check for duplicate attributes.


>> For this solution it needs to be ensured that integer wrap around does
>> not occur, thus the integer array needs to be reset when the integer
>> reaches the maximum value. This probably needs to be checked for each
>> EII with AIIs rather than at the beginning of a parse. It appears at
>> first inconcievable that a docuent could have 2^32 - 1 elements with
>> attributes, however some documents reputably in the airline industry are
>> meant to be huge, and there is the case of XMPP (IIRC) where an XML
>> document is transmitted for the life time of a network connection.
>>
>>
>> - In-scope checking of indexed qualified names
>>
>> To comply fully with well-formed XML it may be necessary to ensure that
>> the namespace of an indexed qualified name is in scope.
>>
>> It is possible that once a qualified name is indexed that it could be
>> referred to again when the qualified name is out of scope. For a
>> conformant FI serializer this cannot occur. Thus i am in two minds
>> whether this is completely necessary. Malicious FI serializers could
>> potentially cause strange things to happen to a parser for the
>> production of an EII or an AII that whose namespace is not in scope.
>> However, for SAX such EIIs and AIIs can still be returned faithfully
>> (the start and end prefix events will be missing).
>>
>> This has made me ponder a bit on the nature of XML namespaces and the
>> concept of scope and whether scope is really necessary for such formats
>> as FI.... however that is a different story and it is good to integrate
>> as closely to the XML 1.x model as possible.
>>
>> To check whether a qualified name is in scope we could have an integer
>> array one plus size of the namespace name table. When a namespace goes
>> into scope the index for the namespace name is used to increment the
>> index + 1 in the integer array. When a namespace goes out of scope the
>> index for the namespace name is used to decrement the index + 1 in the
>> integer array.
>>
>>
>>
> In BEA's RI, they used an idea of stack. When StartElement is
> encountered, depth is incremented by 1 and namespace if any is pushed
> into the stack with the depth. When EndElement is encountered, depth is
> decremented by 1. Before depth--, the stack table is peeked to see if
> the top namespace is in the same depth (i.e. if the namespace is in the
> scope), if it is, it's popped. Also, the values are stored in a map
> eliminating duplicate data for multiple references.
> Is this an idea we could borrow? What would you think?
>

For in-scope namespaces there is already support in the StAX impl. There
are some improvements that can be made to reduce the creation of new
objects that i need to do (need to copy the way the SAX impl does things).

In general again like for the duplicate attributes we should take
advantage of the indexing feature so that it is possible to do very
efficient checks will little overhead in terms of time.

Paul.

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_fi.dev.java.net
For additional commands, e-mail: dev-help_at_fi.dev.java.net