users@fi.java.net

Re: VTD speed

From: Santiago Pericas-Geertsen <Santiago.Pericasgeertsen_at_Sun.COM>
Date: Thu, 22 Feb 2007 08:25:09 -0500

Jimmy,

  JDK 6 includes a number of different optimizations, at the VM layer
and the JAXP layer. However, I don't recall any big changes to
optimize XPath. So this may be the case of several optimizations
working together or perhaps some changes in your benchmark?

  Thanks.

-- Santiago

On Feb 21, 2007, at 7:12 PM, Jimmy Zhang wrote:

> I have noticed a pretty good speed up in jdk6's XPath performance...
> what did you guys do?
> ----- Original Message ----- From: "Santiago Pericas-Geertsen"
> <Santiago.Pericasgeertsen_at_Sun.COM>
> To: <users_at_fi.dev.java.net>
> Cc: <users_at_jaxp.dev.java.net>
> Sent: Friday, January 05, 2007 1:34 PM
> Subject: Re: VTD speed
>
>
>> Hi Jimmy,
>> I'm really interested in learning more about VTD-XML and its
>> XPath support (I'll get to that next week, I hope). Here is a
>> blog I just finished about XPath in the RI [1]. In a nutshell, I
>> believe the main issue is DTMs as explained in that blog. XPath
>> is an area that we'd like to improve in JAXP.next, so I'd like to
>> look at what people have done in the last couple of years.
>> As for benchmarking, I've created a simple test suite based on
>> Japex [2] called XPathpex. I haven't published it yet (but I can
>> send it you privately if you want). It uses documents from the
>> XMark suite and it is base on XPathMark, but because it uses
>> Japex, you get nice reports, multi-threading, etc. A simple
>> driver is all you'd need to write.
>> Thanks for sharing your findings.
>> -- Santiago
>> [1] http://weblogs.java.net/blog/spericas/archive/2007/01/
>> whats_next_for_1.html
>> [2] https://japex.dev.java.net
>> On Jan 5, 2007, at 2:29 AM, Jimmy Zhang wrote:
>>> Paul, Do you know what XPath implementation JDK 1.5 bundles...
>>> We did a benchmarking on its performance, and found that the
>>> performance
>>> is not very good (http://www.ximpleware.com/
>>> benchmark_xpath.html). Assuming
>>> DOM offers good random access, the results seem like there is
>>> something wrong...
>>> is this a known issue?
>>> Best regards,
>>> Jimmy Zhang
>>>
>>> ----- Original Message ----- From: "Paul Sandoz"
>>> <Paul.Sandoz_at_Sun.COM>
>>> To: <users_at_fi.dev.java.net>
>>> Sent: Monday, January 23, 2006 5:12 AM
>>> Subject: Re: VTD speed
>>>
>>>
>>>> Mark Swanson wrote:
>>>>>> a lot of object; VTD-XML doesn't. So VTd-Xml
>>>>>> should hold the edge both in memory and performance...
>>>>>> The best way is to try it...
>>>>>
>>>>>
>>>>> Tried it. VTD wins by a wide margin in memory and performance
>>>>> (throughput as well as memory pressure placed on the GC by
>>>>> object creation) over XmlBeans. I used the VTD Java API and
>>>>> the XmlBeans API to loop through parse, get some element
>>>>> attribute value as String 20 times in loops of 5000.
>>>>>
>>>>
>>>> (just been browsing the VTD parsing code, not a String in sight!)
>>>>
>>>> VTD is a very efficient parser that performs no instantiation
>>>> of String objects when parsing. This is fantastic for the
>>>> scenarios where you only need access to a certain part of the
>>>> document e.g. routing decisions using XPath come to mind.
>>>>
>>>> (IIRC Xerces avoids instantiation of String objects for tags
>>>> that have previously occured by using a symbol table of
>>>> interned Strings.)
>>>>
>>>> As far as i can tell from the code it looks like the Java VTD
>>>> parser is not performing any namespace validation when the
>>>> document is parsed e.g. no checking if a prefix of an element
>>>> or attribute is in-scope. As a consequence duplicate attributes
>>>> are not fully checked at parsing (the local name and namespace
>>>> URI need to be checked in addition to checking for attributes
>>>> with the same qualified name).
>>>>
>>>> VTD parsing avoids a lot of work, some related to instantiation
>>>> of objects and some related to checking of the XML, when
>>>> performed by other APIs and implementations. For the latter it
>>>> may not be know if a non-well-formed document is being parsed,
>>>> and in some cases it will never be known because the the non-
>>>> well-formed parts of the document will never be navigated.
>>>>
>>>> If you need to a access a significant portion the document,
>>>> e.g. for processing SOAP header blocks and data binding the
>>>> payload of a SOAP message, then from the code at least i think
>>>> it likely that, for a UTF-8 encoded SOAP message, UTF-8
>>>> decoding will be performed twice on nearly all relevant
>>>> characters (once by parsing to determin offsets, the second
>>>> when iterating through the document). In addition string
>>>> equality is performed on a per character basis, where as if
>>>> string is interned a binding tool can check for equality using
>>>> '==' (especially useful for namespace URIs).
>>>>
>>>> So it is swings and roundabouts, ya choose ya model to best
>>>> suit ya needs. VTD looks like a great model for some XML
>>>> processing scenarios but defintely not for all.
>>>>
>>>>
>>>>> I'm now even more intrigued by an FI/VTD combo.
>>>>>
>>>>
>>>> This is may be possible, although the VTD representation of the
>>>> document in memory may require some changes. It is certainly
>>>> possible to reference the FI document for literal strings.
>>>> Indexed qualified names and strings may be more problematic.
>>>>
>>>> Paul.
>>>>
>>>> --
>>>> | ? + ? = To question
>>>> ----------------\
>>>> Paul Sandoz
>>>> x38109
>>>> +33-4-76188109
>>>>
>>>> -------------------------------------------------------------------
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
>>>> For additional commands, e-mail: users-help_at_fi.dev.java.net
>>>>
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
>>> For additional commands, e-mail: users-help_at_fi.dev.java.net
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
>> For additional commands, e-mail: users-help_at_fi.dev.java.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
> For additional commands, e-mail: users-help_at_fi.dev.java.net
>