users@fi.java.net

Re: VTD speed

From: Jimmy Zhang <crackeur_at_comcast.net>
Date: Mon, 8 Jan 2007 13:28:18 -0800

Santiago, there is a quick update on the benchmark of XPath performance
in which we justed added data for Jaxen 1.1.. and Jaxen seems to have issues
with //...

http://vtd-xml.sf.net/benchmark_xpath.html

Jimmy

----- Original Message -----
From: "Santiago Pericas-Geertsen" <Santiago.Pericasgeertsen_at_Sun.COM>
To: <users_at_fi.dev.java.net>
Cc: <users_at_jaxp.dev.java.net>
Sent: Friday, January 05, 2007 1:34 PM
Subject: Re: VTD speed


> Hi Jimmy,
>
> I'm really interested in learning more about VTD-XML and its XPath
> support (I'll get to that next week, I hope). Here is a blog I just
> finished about XPath in the RI [1]. In a nutshell, I believe the main
> issue is DTMs as explained in that blog. XPath is an area that we'd like
> to improve in JAXP.next, so I'd like to look at what people have done in
> the last couple of years.
>
> As for benchmarking, I've created a simple test suite based on Japex [2]
> called XPathpex. I haven't published it yet (but I can send it you
> privately if you want). It uses documents from the XMark suite and it is
> base on XPathMark, but because it uses Japex, you get nice reports,
> multi-threading, etc. A simple driver is all you'd need to write.
>
> Thanks for sharing your findings.
>
> -- Santiago
>
> [1] http://weblogs.java.net/blog/spericas/archive/2007/01/
> whats_next_for_1.html
> [2] https://japex.dev.java.net
>
> On Jan 5, 2007, at 2:29 AM, Jimmy Zhang wrote:
>
>> Paul, Do you know what XPath implementation JDK 1.5 bundles...
>> We did a benchmarking on its performance, and found that the performance
>> is not very good (http://www.ximpleware.com/benchmark_xpath.html).
>> Assuming
>> DOM offers good random access, the results seem like there is something
>> wrong...
>> is this a known issue?
>> Best regards,
>> Jimmy Zhang
>>
>> ----- Original Message ----- From: "Paul Sandoz" <Paul.Sandoz_at_Sun.COM>
>> To: <users_at_fi.dev.java.net>
>> Sent: Monday, January 23, 2006 5:12 AM
>> Subject: Re: VTD speed
>>
>>
>>> Mark Swanson wrote:
>>>>> a lot of object; VTD-XML doesn't. So VTd-Xml
>>>>> should hold the edge both in memory and performance...
>>>>> The best way is to try it...
>>>>
>>>>
>>>> Tried it. VTD wins by a wide margin in memory and performance
>>>> (throughput as well as memory pressure placed on the GC by object
>>>> creation) over XmlBeans. I used the VTD Java API and the XmlBeans API
>>>> to loop through parse, get some element attribute value as String 20
>>>> times in loops of 5000.
>>>>
>>>
>>> (just been browsing the VTD parsing code, not a String in sight!)
>>>
>>> VTD is a very efficient parser that performs no instantiation of String
>>> objects when parsing. This is fantastic for the scenarios where you
>>> only need access to a certain part of the document e.g. routing
>>> decisions using XPath come to mind.
>>>
>>> (IIRC Xerces avoids instantiation of String objects for tags that have
>>> previously occured by using a symbol table of interned Strings.)
>>>
>>> As far as i can tell from the code it looks like the Java VTD parser is
>>> not performing any namespace validation when the document is parsed
>>> e.g. no checking if a prefix of an element or attribute is in-scope. As
>>> a consequence duplicate attributes are not fully checked at parsing
>>> (the local name and namespace URI need to be checked in addition to
>>> checking for attributes with the same qualified name).
>>>
>>> VTD parsing avoids a lot of work, some related to instantiation of
>>> objects and some related to checking of the XML, when performed by
>>> other APIs and implementations. For the latter it may not be know if a
>>> non-well-formed document is being parsed, and in some cases it will
>>> never be known because the the non-well-formed parts of the document
>>> will never be navigated.
>>>
>>> If you need to a access a significant portion the document, e.g. for
>>> processing SOAP header blocks and data binding the payload of a SOAP
>>> message, then from the code at least i think it likely that, for a
>>> UTF-8 encoded SOAP message, UTF-8 decoding will be performed twice on
>>> nearly all relevant characters (once by parsing to determin offsets,
>>> the second when iterating through the document). In addition string
>>> equality is performed on a per character basis, where as if string is
>>> interned a binding tool can check for equality using '==' (especially
>>> useful for namespace URIs).
>>>
>>> So it is swings and roundabouts, ya choose ya model to best suit ya
>>> needs. VTD looks like a great model for some XML processing scenarios
>>> but defintely not for all.
>>>
>>>
>>>> I'm now even more intrigued by an FI/VTD combo.
>>>>
>>>
>>> This is may be possible, although the VTD representation of the
>>> document in memory may require some changes. It is certainly possible
>>> to reference the FI document for literal strings. Indexed qualified
>>> names and strings may be more problematic.
>>>
>>> Paul.
>>>
>>> --
>>> | ? + ? = To question
>>> ----------------\
>>> Paul Sandoz
>>> x38109
>>> +33-4-76188109
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
>>> For additional commands, e-mail: users-help_at_fi.dev.java.net
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
>> For additional commands, e-mail: users-help_at_fi.dev.java.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
> For additional commands, e-mail: users-help_at_fi.dev.java.net
>
>