Re: Non-FI performance problems

From: Jimmy Zhang <crackeur_at_comcast.net>
Date: Fri, 20 Jan 2006 17:28:08 -0800

Thanks for the comment, DOM's mutability is at the data structure level,
which can be inefficient if the change is small.. but nonetheless very
flexible
since one can make arbitrary changes to the document.

VTD-XML is XML centric since it allows modification on to the XML itself,
this property has its cons and pros. Complex changes is less flexible, and
may
be used in conjunction with DOM in the some case. On the hand for some
classes
of modications, its efficiency is unrivaled. This feature *alone* will lead
to many
interesting new ways to use XML previously thought impossible due to DOM's
round trip overhead of takeing everything apart and putting it back,
regardless
of the complexity of changes...
----- Original Message -----
From: "Tatu Saloranta" <cowtowncoder_at_yahoo.com>
To: <users_at_fi.dev.java.net>
Sent: Friday, January 20, 2006 4:06 PM
Subject: Re: Non-FI performance problems

> --- Jimmy Zhang <crackeur_at_comcast.net> wrote:
>
>> VTD-XML is just an XML parser, so its encoding is
>> XML.
>> In next release, VTD-XML will introduce its native
>> encoding which
>> is XML compatible: XML + VTD .
>>
>> deferred node expansion will slow down navigation,
>> and when fully
>> expanded, actually consumes more memory. VTD-XML on
>> the other
>> hand is fully expanded with its 1.3~1.5x multplying
>> factor. The default
>
> The other side of the coin is that generally first
> thing application/library code will do is then to ask
> textual values in form of Strings... and I assume VTD
> will then either construct String, or just hand char
> array to use for construction. In any case, this is
> one form of deferred construction, unless one's code
> only deals with char arrays? Just an observation,
> nothing specifically wrong with this... but it has to
> be taken into account when measuring expected
> performance.
>
> Now; DOM trees also have overhead due to mutability of
> nodes (and specifically possibility of moving nodes
> between documents or so?). What I am wondering, with
> respect to VTD, is that when one does modify nodes,
> does VTD internally then modify the binary document
> source? Doesn't that then mean that there are rather
> large memory copy operations when content
> grows/shrinks?
> Or does VTD implement more sophisticated structures
> for storing changes before they get serialized?
>
> I once wrote an in-memory document parser that
> implemented something (superficially?) similar to VTD,
> to allow for very fast in-place replacements of
> sub-trees, used for html scraping... and while result
> was fast, I remember there being quite a few problems
> if I was to make that more robust and versatile.
> Worked great for the problem at hand, though, and was
> about 2x as fast as Xerces SAX for parsing (without
> doing much in the way of normal optimizations for
> String reuse or so); and output was rather fast too,
> since it could mostly just use input buffers for
> unmodified parts.
>
> -+ Tatu +-
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
> For additional commands, e-mail: users-help_at_fi.dev.java.net
>
>