users@fi.java.net

Re: VTD speed

From: Jimmy Zhang <crackeur_at_comcast.net>
Date: Thu, 4 Jan 2007 23:29:52 -0800

Paul, Do you know what XPath implementation JDK 1.5 bundles...
We did a benchmarking on its performance, and found that the performance
is not very good (http://www.ximpleware.com/benchmark_xpath.html). Assuming
DOM offers good random access, the results seem like there is something
wrong...
is this a known issue?
Best regards,
Jimmy Zhang

----- Original Message -----
From: "Paul Sandoz" <Paul.Sandoz_at_Sun.COM>
To: <users_at_fi.dev.java.net>
Sent: Monday, January 23, 2006 5:12 AM
Subject: Re: VTD speed


> Mark Swanson wrote:
>>> a lot of object; VTD-XML doesn't. So VTd-Xml
>>> should hold the edge both in memory and performance...
>>> The best way is to try it...
>>
>>
>> Tried it. VTD wins by a wide margin in memory and performance (throughput
>> as well as memory pressure placed on the GC by object creation) over
>> XmlBeans. I used the VTD Java API and the XmlBeans API to loop through
>> parse, get some element attribute value as String 20 times in loops of
>> 5000.
>>
>
> (just been browsing the VTD parsing code, not a String in sight!)
>
> VTD is a very efficient parser that performs no instantiation of String
> objects when parsing. This is fantastic for the scenarios where you only
> need access to a certain part of the document e.g. routing decisions using
> XPath come to mind.
>
> (IIRC Xerces avoids instantiation of String objects for tags that have
> previously occured by using a symbol table of interned Strings.)
>
> As far as i can tell from the code it looks like the Java VTD parser is
> not performing any namespace validation when the document is parsed e.g.
> no checking if a prefix of an element or attribute is in-scope. As a
> consequence duplicate attributes are not fully checked at parsing (the
> local name and namespace URI need to be checked in addition to checking
> for attributes with the same qualified name).
>
> VTD parsing avoids a lot of work, some related to instantiation of objects
> and some related to checking of the XML, when performed by other APIs and
> implementations. For the latter it may not be know if a non-well-formed
> document is being parsed, and in some cases it will never be known because
> the the non-well-formed parts of the document will never be navigated.
>
> If you need to a access a significant portion the document, e.g. for
> processing SOAP header blocks and data binding the payload of a SOAP
> message, then from the code at least i think it likely that, for a UTF-8
> encoded SOAP message, UTF-8 decoding will be performed twice on nearly all
> relevant characters (once by parsing to determin offsets, the second when
> iterating through the document). In addition string equality is performed
> on a per character basis, where as if string is interned a binding tool
> can check for equality using '==' (especially useful for namespace URIs).
>
> So it is swings and roundabouts, ya choose ya model to best suit ya needs.
> VTD looks like a great model for some XML processing scenarios but
> defintely not for all.
>
>
>> I'm now even more intrigued by an FI/VTD combo.
>>
>
> This is may be possible, although the VTD representation of the document
> in memory may require some changes. It is certainly possible to reference
> the FI document for literal strings. Indexed qualified names and strings
> may be more problematic.
>
> Paul.
>
> --
> | ? + ? = To question
> ----------------\
> Paul Sandoz
> x38109
> +33-4-76188109
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
> For additional commands, e-mail: users-help_at_fi.dev.java.net
>
>