users@fi.java.net

Re: VTD speed

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Tue, 24 Jan 2006 08:58:35 +0100

Hi Jimmy,

Jimmy Zhang wrote:
> Thanks for the comment.
>
> VTD-XML outperform's SAX (ns or not) because it
> minimizes object allocation.
>

...and avoids or defers certain namespace well-formed checks until the
document is navigated.

I hope i am not coming accross as overly negative on VTD-XML. That is
not my intention. I think it is very interesting, effectively solving
some XML processing problems, and provides a useful model for XML
processing. I just want to understand the advantages and tradeoffs when
compared to other models.


> XML data is often hierarchical, which means the best way to process them
> is to allow random access.
>

I would say sometimes, but not *always* the best.


> Other issue of SAX: if the doc is not wellformed, say, the last
> character is not '>', in this case the document is not wellformed, but
> the code making use of SAX only processes a part of XML,so the
> well-formedness error is not really detected and probably won't matter ...

The error will be detected because of the way SAX pushes all events to
the handlers. An exception will be thrown when the well-formed error
occurs, but after some events will have been processed.

Using the StAX API it is easier to stop the parse so one could process
up to some part of an XML document and there could be well-formed errors
after that are never reached.


> I guess VTD-XML and SAX each has its pros and cons,

Yes, that is it.


> direct apple-to-apple comparison is hard ...
>

Perhaps. I think it would be useful to compare for two cases when
processing XML documents (with one or more namespaces):

1) Where the whole infoset needs to be processed, e.g. for binding; and

2) Where some part of the infoset needs to be processed e.g. executing
    some XPath expression.

IMHO comparing just VTD parsing SAX and StAX parsing is not enough to
show that VTD is faster. Because of the way VTD works the benchmarks
need to do something with what is parsed. I think that this would better
show VTDs (and other models) strengths and weaknesses.

Paul.

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109