Re: TODO for FastInfoset

From: Joe Wang <joe.wang_at_sun.com>
Date: Thu, 13 Jan 2005 14:37:13 -0800

Paul Sandoz wrote:

> Joe Wang wrote:
>
>> Paul Sandoz wrote:
>>
>>> Paul Sandoz wrote:
>>>
>>>
>>>> Generic
>>>> -------
>>>>
>>>>
>>>
>>>
>>>
>>> Two further generic items are:
>>>
>>> - Duplicate attributes
>>>
>>> To comply fully with well-formed XML it is necessary to ensure that
>>> there are no duplicate attributes among the [attributes] property of
>>> an EII.
>>>
>>> For efficient checking an integer array the size of the attribute name
>>> table is required and an integer that represents the current parser
>>> state. Each time an EII with AIIs occurs the integer is incremented.
>>> For
>>> each AII the index that corresponds to the AII is used to obtain the
>>> value in the integer array. If this value is equal to the integer then
>>> the attribute has occured before. If this value is not equal to the
>>> integer then the attribute has not occured before and the value in the
>>> integer array is set to the integer.
>>>
>>>
>>>
>> I noticed the attribute list array. Any reason why we were not using
>> map for holding Attributes?
>>
>
> Because it is slow and will contribute to garbage collection.
>
> Fast Infoset indexes attributes and we should take advantage of this
> feature for performance.

I see. The event api exposes Attributes/Namespaces as an Iteration.
That's why I used map/list. I'll replace them with the attribute array.

-- Joe

>
> An attribute should be checked to see if it is a duplicate before it
> is added to the list of attributes.
>
> IIRC some fast XML parsers do not bother to check for duplicate
> attributes.
>
>
>>> For this solution it needs to be ensured that integer wrap around does
>>> not occur, thus the integer array needs to be reset when the integer
>>> reaches the maximum value. This probably needs to be checked for each
>>> EII with AIIs rather than at the beginning of a parse. It appears at
>>> first inconcievable that a docuent could have 2^32 - 1 elements with
>>> attributes, however some documents reputably in the airline industry
>>> are
>>> meant to be huge, and there is the case of XMPP (IIRC) where an XML
>>> document is transmitted for the life time of a network connection.
>>>
>>>
>>> - In-scope checking of indexed qualified names
>>>
>>> To comply fully with well-formed XML it may be necessary to ensure that
>>> the namespace of an indexed qualified name is in scope.
>>>
>>> It is possible that once a qualified name is indexed that it could be
>>> referred to again when the qualified name is out of scope. For a
>>> conformant FI serializer this cannot occur. Thus i am in two minds
>>> whether this is completely necessary. Malicious FI serializers could
>>> potentially cause strange things to happen to a parser for the
>>> production of an EII or an AII that whose namespace is not in scope.
>>> However, for SAX such EIIs and AIIs can still be returned faithfully
>>> (the start and end prefix events will be missing).
>>>
>>> This has made me ponder a bit on the nature of XML namespaces and the
>>> concept of scope and whether scope is really necessary for such formats
>>> as FI.... however that is a different story and it is good to integrate
>>> as closely to the XML 1.x model as possible.
>>>
>>> To check whether a qualified name is in scope we could have an integer
>>> array one plus size of the namespace name table. When a namespace goes
>>> into scope the index for the namespace name is used to increment the
>>> index + 1 in the integer array. When a namespace goes out of scope the
>>> index for the namespace name is used to decrement the index + 1 in the
>>> integer array.
>>>
>>>
>>>
>> In BEA's RI, they used an idea of stack. When StartElement is
>> encountered, depth is incremented by 1 and namespace if any is pushed
>> into the stack with the depth. When EndElement is encountered, depth
>> is decremented by 1. Before depth--, the stack table is peeked to
>> see if the top namespace is in the same depth (i.e. if the namespace
>> is in the scope), if it is, it's popped. Also, the values are stored
>> in a map eliminating duplicate data for multiple references.
>> Is this an idea we could borrow? What would you think?
>>
>
> For in-scope namespaces there is already support in the StAX impl.
> There are some improvements that can be made to reduce the creation of
> new objects that i need to do (need to copy the way the SAX impl does
> things).
>
> In general again like for the duplicate attributes we should take
> advantage of the indexing feature so that it is possible to do very
> efficient checks will little overhead in terms of time.
>
> Paul.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_fi.dev.java.net
For additional commands, e-mail: dev-help_at_fi.dev.java.net