dev@fi.java.net

Re: Support required with resulting dependency on Xerces <was> Re: XML Names and FI parsing

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Tue, 01 Mar 2005 13:11:50 +0100

Santiago Pericas-Geertsen wrote:
> On Feb 28, 2005, at 5:17 AM, Paul Sandoz wrote:
>
>
>>Hi,
>>
>>After further thought we should implement this so as to be credible
>>with the XML community.
>
>
> I'm not sure. I'd rather have this as an option (turned off by
> default). A lot of people working with alternate serializations feel
> that these checks are redundantly unnecessary.
>

Right, for areas that do not require these checks because round-tripping
is not important or there are additional constraints that make it
difficult to perform such checks efficiently in time by using more space
(e.g. using a 2^16 table).


> Fast infoset set is a serialization of the Infoset which can only
> contain legal characters. In that sense, I don't believe those checks
> belong here, but would be OK to have them as an option.
>

But the problem is XML infosets parsed from fast infoset documents that
will serialize to non-well formed XML documents.

Seems to me this is an implementation choice for the parser and if we
want to be interoperable with XML we should implement it.

I think that the performance cost will not be that great since the cost
of combining checking with UTF-8 decoding is quite fast for Basic Latin
(an array lookup and inequality check) and will defer to the Xerces
XMLChar class (XML 1.0) for higher character ranges. This will also only
be performed once on the literal string.

I am proposing not to support XML 1.1 for now as it is not supported by
SOAP 1.x.


In terms of performance I am more concerned about efficiently checking
for duplicate namespace attributes and attributes. The latter is a pain
because it is necessary to compare the namespace names and local Names
as two different prefixes can be bound to the same namespace.

Paul.

-- 
| ? + ? = To question
----------------\
   Paul Sandoz
        x38109
+33-4-76188109