users@fi.java.net

Re: Non-FI performance problems

From: Tatu Saloranta <cowtowncoder_at_yahoo.com>
Date: Fri, 20 Jan 2006 16:06:48 -0800 (PST)

--- Jimmy Zhang <crackeur_at_comcast.net> wrote:

> VTD-XML is just an XML parser, so its encoding is
> XML.
> In next release, VTD-XML will introduce its native
> encoding which
> is XML compatible: XML + VTD .
>
> deferred node expansion will slow down navigation,
> and when fully
> expanded, actually consumes more memory. VTD-XML on
> the other
> hand is fully expanded with its 1.3~1.5x multplying
> factor. The default

The other side of the coin is that generally first
thing application/library code will do is then to ask
textual values in form of Strings... and I assume VTD
will then either construct String, or just hand char
array to use for construction. In any case, this is
one form of deferred construction, unless one's code
only deals with char arrays? Just an observation,
nothing specifically wrong with this... but it has to
be taken into account when measuring expected
performance.

Now; DOM trees also have overhead due to mutability of
nodes (and specifically possibility of moving nodes
between documents or so?). What I am wondering, with
respect to VTD, is that when one does modify nodes,
does VTD internally then modify the binary document
source? Doesn't that then mean that there are rather
large memory copy operations when content
grows/shrinks?
Or does VTD implement more sophisticated structures
for storing changes before they get serialized?

I once wrote an in-memory document parser that
implemented something (superficially?) similar to VTD,
to allow for very fast in-place replacements of
sub-trees, used for html scraping... and while result
was fast, I remember there being quite a few problems
if I was to make that more robust and versatile.
Worked great for the problem at hand, though, and was
about 2x as fast as Xerces SAX for parsing (without
doing much in the way of normal optimizations for
String reuse or so); and output was rather fast too,
since it could mostly just use input buffers for
unmodified parts.

-+ Tatu +-


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com