Re: VTD speed

From: Tatu Saloranta <cowtowncoder_at_yahoo.com>
Date: Thu, 22 Feb 2007 14:47:31 -0800 (PST)

--- Santiago Pericas-Geertsen
<Santiago.Pericasgeertsen_at_Sun.COM> wrote:

> Jimmy,
>
> JDK 6 includes a number of different
> optimizations, at the VM layer
> and the JAXP layer. However, I don't recall any big
> changes to
> optimize XPath. So this may be the case of several
> optimizations
> working together or perhaps some changes in your
> benchmark?

There was this one bug (I wish I had BugID handy) that
basically made some sub-tree xpath queries hideously
slow, as in proportional to the size of the whole
tree, instead of just subtree affected: for which one
work around was to detach the child node (root of the
sub-tree), do xpath, then reattch the node.
I think had to do with buiding of XPathContext or
something.
Does this ring a bell? I can look it up if necessary.

My team mate hit that when using XPath (unnecessarily,
in my opinion, but anyways) for locating element
attribute values, and simple child lookups.
Anyway, I just rememeber that sounding like a very
severe performance problem for bigger documents; and
seemed like something that could and should be fixed.
I would expect it to be fixed for Java6.

-+ Tatu +-

>
> Thanks.
>
> -- Santiago
>
> On Feb 21, 2007, at 7:12 PM, Jimmy Zhang wrote:
>
> > I have noticed a pretty good speed up in jdk6's
> XPath performance...
> > what did you guys do?
> > ----- Original Message ----- From: "Santiago
> Pericas-Geertsen"
> > <Santiago.Pericasgeertsen_at_Sun.COM>
> > To: <users_at_fi.dev.java.net>
> > Cc: <users_at_jaxp.dev.java.net>
> > Sent: Friday, January 05, 2007 1:34 PM
> > Subject: Re: VTD speed
> >
> >
> >> Hi Jimmy,
> >> I'm really interested in learning more about
> VTD-XML and its
> >> XPath support (I'll get to that next week, I
> hope). Here is a
> >> blog I just finished about XPath in the RI [1].
> In a nutshell, I
> >> believe the main issue is DTMs as explained in
> that blog. XPath
> >> is an area that we'd like to improve in
> JAXP.next, so I'd like to
> >> look at what people have done in the last couple
> of years.
> >> As for benchmarking, I've created a simple test
> suite based on
> >> Japex [2] called XPathpex. I haven't published it
> yet (but I can
> >> send it you privately if you want). It uses
> documents from the
> >> XMark suite and it is base on XPathMark, but
> because it uses
> >> Japex, you get nice reports, multi-threading,
> etc. A simple
> >> driver is all you'd need to write.
> >> Thanks for sharing your findings.
> >> -- Santiago
> >> [1]
>
http://weblogs.java.net/blog/spericas/archive/2007/01/
>
> >> whats_next_for_1.html
> >> [2] https://japex.dev.java.net
> >> On Jan 5, 2007, at 2:29 AM, Jimmy Zhang wrote:
> >>> Paul, Do you know what XPath implementation JDK
> 1.5 bundles...
> >>> We did a benchmarking on its performance, and
> found that the
> >>> performance
> >>> is not very good (http://www.ximpleware.com/
> >>> benchmark_xpath.html). Assuming
> >>> DOM offers good random access, the results seem
> like there is
> >>> something wrong...
> >>> is this a known issue?
> >>> Best regards,
> >>> Jimmy Zhang
> >>>
> >>> ----- Original Message ----- From: "Paul Sandoz"
>
> >>> <Paul.Sandoz_at_Sun.COM>
> >>> To: <users_at_fi.dev.java.net>
> >>> Sent: Monday, January 23, 2006 5:12 AM
> >>> Subject: Re: VTD speed
> >>>
> >>>
> >>>> Mark Swanson wrote:
> >>>>>> a lot of object; VTD-XML doesn't. So VTd-Xml
> >>>>>> should hold the edge both in memory and
> performance...
> >>>>>> The best way is to try it...
> >>>>>
> >>>>>
> >>>>> Tried it. VTD wins by a wide margin in memory
> and performance
> >>>>> (throughput as well as memory pressure placed
> on the GC by
> >>>>> object creation) over XmlBeans. I used the
> VTD Java API and
> >>>>> the XmlBeans API to loop through parse, get
> some element
> >>>>> attribute value as String 20 times in loops
> of 5000.
> >>>>>
> >>>>
> >>>> (just been browsing the VTD parsing code, not a
> String in sight!)
> >>>>
> >>>> VTD is a very efficient parser that performs no
> instantiation
> >>>> of String objects when parsing. This is
> fantastic for the
> >>>> scenarios where you only need access to a
> certain part of the
> >>>> document e.g. routing decisions using XPath
> come to mind.
> >>>>
> >>>> (IIRC Xerces avoids instantiation of String
> objects for tags
> >>>> that have previously occured by using a symbol
> table of
> >>>> interned Strings.)
> >>>>
> >>>> As far as i can tell from the code it looks
> like the Java VTD
> >>>> parser is not performing any namespace
> validation when the
> >>>> document is parsed e.g. no checking if a prefix
> of an element
> >>>> or attribute is in-scope. As a consequence
> duplicate attributes
> >>>> are not fully checked at parsing (the local
> name and namespace
> >>>> URI need to be checked in addition to checking
> for attributes
> >>>> with the same qualified name).
> >>>>
> >>>> VTD parsing avoids a lot of work, some related
> to instantiation
> >>>> of objects and some related to checking of the
> XML, when
> >>>> performed by other APIs and implementations.
> For the latter it
> >>>> may not be know if a non-well-formed document
> is being parsed,
> >>>> and in some cases it will never be known
> because the the non-
> >>>> well-formed parts of the document will never
> be navigated.
> >>>>
> >>>> If you need to a access a significant portion
> the document,
> >>>> e.g. for processing SOAP header blocks and
> data binding the
> >>>> payload of a SOAP message, then from the code
> at least i think
> >>>> it likely that, for a UTF-8 encoded SOAP
> message, UTF-8
> >>>> decoding will be performed twice on nearly all
> relevant
> >>>> characters (once by parsing to determin
> offsets, the second
> >>>> when iterating through the document). In
> addition string
> >>>> equality is performed on a per character
> basis, where as if
> >>>> string is interned a binding tool can check
> for equality using
> >>>> '==' (especially useful for namespace URIs).
> >>>>
> >>>> So it is swings and roundabouts, ya choose ya
> model to best
> >>>> suit ya needs. VTD looks like a great model
> for some XML
> >>>> processing scenarios but defintely not for
> all.
> >>>>
> >>>>
> >>>>> I'm now even more intrigued by an FI/VTD
> combo.
> >>>>>
> >>>>
> >>>> This is may be possible, although the VTD
> representation of the
> >>>> document in memory may require some changes. It
> is certainly
> >>>> possible to reference the FI document for
> literal strings.
> >>>> Indexed qualified names and strings may be
> more problematic.
> >>>>
> >>>> Paul.
> >>>>
> >>>> --
> >>>> | ? + ? = To question
> >>>> ----------------\
> >>>> Paul Sandoz
> >>>> x38109
> >>>> +33-4-76188109
> >>>>
> >>>>
>
-------------------------------------------------------------------
>
> >>>> --
>
=== message truncated ===

____________________________________________________________________________________
Never Miss an Email
Stay connected with Yahoo! Mail on your mobile. Get started!
http://mobile.yahoo.com/services?promote=mail