Re: VTD speed

From: Jimmy Zhang <crackeur_at_comcast.net>
Date: Tue, 24 Jan 2006 09:41:39 -0800

It is good discussion. VTD-XML is a new technology and not
many people are aware of it and understand its pros and cons,
so it is *good* to receive criticism, as it helps VTD-XML to
improve. So I appreciate all the comments.

For the namespace binding part, there is nothing that prevents
XimpleWare from including the logic to do that ... but we thought
about how SAX handle well formness errors/exceptions, we
realized that the correct way to use SAX or Pull, is to scan the
document at least twice,

The first time is from the start to end, just to make sure the entire
document is well-formed.The second time is to perform application
logic.

  If the step 1 is not done, one can not be sure the document is
wellformed , e.g. the last '>' is missing

  Otherwise, think about the case where one parses the doc and
performs 10 transaction, then the wellformedness exception occurs
at the end of the document. What should he do now? Roll back
those 10 transactions??

  Yet, few people bother with step 1, not because they don't care
about the well-formed ness, more likely they don't like the overhead
of step 1.

So to compare VTD-XML with SAX, maybe we should use SAX
to scan the document twice ...

----- Original Message -----
From: "Paul Sandoz" <Paul.Sandoz_at_Sun.COM>
To: <users_at_fi.dev.java.net>
Sent: Monday, January 23, 2006 11:58 PM
Subject: Re: VTD speed

> Hi Jimmy,
>
> Jimmy Zhang wrote:
>> Thanks for the comment.
>>
>> VTD-XML outperform's SAX (ns or not) because it
>> minimizes object allocation.
>>
>
> ...and avoids or defers certain namespace well-formed checks until the
> document is navigated.
>
> I hope i am not coming accross as overly negative on VTD-XML. That is not
> my intention. I think it is very interesting, effectively solving some XML
> processing problems, and provides a useful model for XML processing. I
> just want to understand the advantages and tradeoffs when compared to
> other models.
>
>
>> XML data is often hierarchical, which means the best way to process them
>> is to allow random access.
>>
>
> I would say sometimes, but not *always* the best.
>
>
>> Other issue of SAX: if the doc is not wellformed, say, the last character
>> is not '>', in this case the document is not wellformed, but the code
>> making use of SAX only processes a part of XML,so the well-formedness
>> error is not really detected and probably won't matter ...
>
> The error will be detected because of the way SAX pushes all events to the
> handlers. An exception will be thrown when the well-formed error occurs,
> but after some events will have been processed.
>
> Using the StAX API it is easier to stop the parse so one could process up
> to some part of an XML document and there could be well-formed errors
> after that are never reached.
>
>
>> I guess VTD-XML and SAX each has its pros and cons,
>
> Yes, that is it.
>
>
>> direct apple-to-apple comparison is hard ...
>>
>
> Perhaps. I think it would be useful to compare for two cases when
> processing XML documents (with one or more namespaces):
>
> 1) Where the whole infoset needs to be processed, e.g. for binding; and
>
> 2) Where some part of the infoset needs to be processed e.g. executing
> some XPath expression.
>
> IMHO comparing just VTD parsing SAX and StAX parsing is not enough to show
> that VTD is faster. Because of the way VTD works the benchmarks need to do
> something with what is parsed. I think that this would better show VTDs
> (and other models) strengths and weaknesses.
>
> Paul.
>
> --
> | ? + ? = To question
> ----------------\
> Paul Sandoz
> x38109
> +33-4-76188109
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
> For additional commands, e-mail: users-help_at_fi.dev.java.net
>
>