Hi Jimmy,
Jimmy Zhang wrote:
> Paul, Do you know what XPath implementation JDK 1.5 bundles...
The default is that based on the Xalan interpretive processor from Xalan.
> We did a benchmarking on its performance, and found that the performance
> is not very good (http://www.ximpleware.com/benchmark_xpath.html). Assuming
> DOM offers good random access, the results seem like there is something
> wrong...
> is this a known issue?
Yes. There are known performance issues with the XPath impl of JAXP
bundled with JDK 1.5.
Santiago has been working on measuring and improving the performance.
Paul.
> Best regards,
> Jimmy Zhang
>
> ----- Original Message ----- From: "Paul Sandoz" <Paul.Sandoz_at_Sun.COM>
> To: <users_at_fi.dev.java.net>
> Sent: Monday, January 23, 2006 5:12 AM
> Subject: Re: VTD speed
>
>
>> Mark Swanson wrote:
>>>> a lot of object; VTD-XML doesn't. So VTd-Xml
>>>> should hold the edge both in memory and performance...
>>>> The best way is to try it...
>>>
>>>
>>> Tried it. VTD wins by a wide margin in memory and performance
>>> (throughput as well as memory pressure placed on the GC by object
>>> creation) over XmlBeans. I used the VTD Java API and the XmlBeans API
>>> to loop through parse, get some element attribute value as String 20
>>> times in loops of 5000.
>>>
>>
>> (just been browsing the VTD parsing code, not a String in sight!)
>>
>> VTD is a very efficient parser that performs no instantiation of
>> String objects when parsing. This is fantastic for the scenarios where
>> you only need access to a certain part of the document e.g. routing
>> decisions using XPath come to mind.
>>
>> (IIRC Xerces avoids instantiation of String objects for tags that have
>> previously occured by using a symbol table of interned Strings.)
>>
>> As far as i can tell from the code it looks like the Java VTD parser
>> is not performing any namespace validation when the document is parsed
>> e.g. no checking if a prefix of an element or attribute is in-scope.
>> As a consequence duplicate attributes are not fully checked at parsing
>> (the local name and namespace URI need to be checked in addition to
>> checking for attributes with the same qualified name).
>>
>> VTD parsing avoids a lot of work, some related to instantiation of
>> objects and some related to checking of the XML, when performed by
>> other APIs and implementations. For the latter it may not be know if a
>> non-well-formed document is being parsed, and in some cases it will
>> never be known because the the non-well-formed parts of the document
>> will never be navigated.
>>
>> If you need to a access a significant portion the document, e.g. for
>> processing SOAP header blocks and data binding the payload of a SOAP
>> message, then from the code at least i think it likely that, for a
>> UTF-8 encoded SOAP message, UTF-8 decoding will be performed twice on
>> nearly all relevant characters (once by parsing to determin offsets,
>> the second when iterating through the document). In addition string
>> equality is performed on a per character basis, where as if string is
>> interned a binding tool can check for equality using '==' (especially
>> useful for namespace URIs).
>>
>> So it is swings and roundabouts, ya choose ya model to best suit ya
>> needs. VTD looks like a great model for some XML processing scenarios
>> but defintely not for all.
>>
>>
>>> I'm now even more intrigued by an FI/VTD combo.
>>>
>>
>> This is may be possible, although the VTD representation of the
>> document in memory may require some changes. It is certainly possible
>> to reference the FI document for literal strings. Indexed qualified
>> names and strings may be more problematic.
>>
>> Paul.
>>
>> --
>> | ? + ? = To question
>> ----------------\
>> Paul Sandoz
>> x38109
>> +33-4-76188109
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
>> For additional commands, e-mail: users-help_at_fi.dev.java.net
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
> For additional commands, e-mail: users-help_at_fi.dev.java.net
>
--
| ? + ? = To question
----------------\
Paul Sandoz
x38109
+33-4-76188109