Re: Non-FI performance problems

From: Jimmy Zhang <crackeur_at_comcast.net>
Date: Fri, 20 Jan 2006 17:17:48 -0800

----- Original Message -----
From: "Tatu Saloranta" <cowtowncoder_at_yahoo.com>
To: <users_at_fi.dev.java.net>
Sent: Friday, January 20, 2006 3:35 PM
Subject: Re: Non-FI performance problems

> --- Jimmy Zhang <crackeur_at_comcast.net> wrote:
>
>> Have you heard of the latest XML processing models
>> called
>> VTD-XML (http://vtd-xml.sf.net) that significantly
>> outpreforms
>> StAX?
>
> Jimmy, although the concept is interesting, what would
> really be useful would be actual real application
> code, which would prove the claims; better yet,
> something that developers could run themselves. I
> mean, extra-ordinary claims need extra-ordinary proof.

Real application code would be very helpful, but XML is used
in so many ways so it is difficult to provide a general purpose app
code that applies to all use cases. As an extreme case, I can parse
XML then go into a infinite loop (while(true);) in this case, the parser
performance would not matter.

>
> What frustrates me a bit is that it's not enough to
> present new technologies as interesting and
> potentially more efficient (and this is not just about
> VTD but many other new approaches), but making big
> claims without backing data. I'm sure there are cases
> where approaches like VTD can improve performance, but
> unfortunately there are also many restrictions. For
> example:

VTD-XML, like many technologies, isn't perfect, it is designed
to provide an option and new possiblities...

>
> (a) Dealing with namespace bindings is tricky; and in
> general it is not possible to just replace sub-trees
> in-place (not so much a problem for read-only)

This is done by lookups in VTD-XML.

> (b) I thought VTD did not deal with entities (or just
> with char entities?); this may or may not be a big
> deal. I hope it at least allows using char entities
> for quoting '<'s and '&'s?

DTDs are still around, but seems to have been deprecated
somewhat, consider soap or REST as two examples. SOAP
explicitly removed DTD because it drags performance...

> (c) Most application code pretty much requires use of
> Strings as values and names -- it's all fine that the
> object model exposes these as char arrays, but most
> applications will invariably need to use String
> equivalents; and as such, gains are not realized (or
> worse yet, application code has to create temporary
> strings whereas parsers would be able to share
> intern()ed ones).

Not much of a problem, from the FAQ section, the explanation
is that

Although VTD-XML claims to be non-extractive, but in many cases developers
still have to extract data, will that degrade performance of VTD-XML?
No, that is actually no much of a problem. For the following reasons:

  1.. A lot of metadata, i.e. tags and attribute names, are mostly used for
navigation purposes, so they don't need to be converted into string objects
  2.. VTD-XML converts VTD records to primitive data types without
converting them into strings.
  3.. In DOM the biggest overhead is creating node objects, which VTD-XML
completely avoids.
  4.. Even one has to extract data into strings, if he knows beforehand the
length of the string, the string allocation is in fact quite fast. If string
length is not known, the string buffer implementation potentially makes a
lot of copy and discard if the string length exceeds allocated buffer
length, which can be inefficient. A VTD record encodes the token length,
which improve string allocation performance.

>
> Now, it is possible that random-access works nice on
> read-only access. But will it really work with mutable
> documents too? Plus, since the memory usage is
> (granted, only) 1.5x of the input document size, this
> still means that streaming parsers have to be used for
> sizable documents: difference between 3x (XOM) and
> 1.5x (VTD) is not an order of magnitude difference.

Streaming parsers are going to be useful for large documents,
but when data access pattern is complex or flexibility is key,
random access is often a must-have feature, 3x for XOM is
probably the best case scenaio, for complex docs, the multiplying
factor is usually higher... Xerces in the best case can do 4x

>
> So what I am really wondering is this: since indexing
> is most beneficial for big documents (being able to
> skip to the middle of the document, theoretically),
> but VTD still needs the whole document serialization
> in-memory, isn't there bit of conflict here? VTD has
> to try to position itself between full convenient
> tree-model based methods (JDOM, DOM4J, XOM), and
> streaming processors (SAX, StAX), as a better
> trade-off... but is there enough of niche? "Half the
> memory usage of DOM, speed almost as good as StAX?".
> Of course, StAX kind of tries to do the same too
> ("more convenient than sax, much faster than dom")
>
Don't seem like a conflict to me, if the doc isn't in memory,
where to skip to, on disk? that is painfully slow, right?

>Half the memory usage of DOM, speed almost as good as StAX?".

that doesn't seem like VTD-XML, that is probably XML bean??

Overall the point is that VTD-XML is not a perfect solution,
but for what it is designed for, it just works...

> Anyhow, I would be very interested in some real world
> application performance results!
>
> -+ Tatu +-
>
>> ----- Original Message -----
>> From: "Mark Swanson" <mark_at_ScheduleWorld.com>
>> To: <users_at_fi.dev.java.net>
>> Sent: Thursday, January 19, 2006 8:00 AM
>> Subject: Non-FI performance problems
>>
>>
>> > Hello,
>> >
>> > Before I tested the nice time/space FI compression
>> I wanted to make sure
>> > that normal non-FI clients would still perform
>> well. In summary I found
>> > that the FI stax implementation for non-FI clients
>> was about 1.85x
>> > slower than woodstox and about 1.4x slower than
>> the RI.
>> >
>> > I tested the 3 implementations using XFire and
>> XmlBeans using a
>> > document/wrapped service with a small XmlBean
>> argument.
>> >
>> > I loop through the remote method 50 times and
>> print how long that took
>> > 20 times. I did the test 3 times for each stax
>> implementation to allow
>> > hotspot some time to do its stuff (at least on the
>> server side).
>> >
>> > The tiny SOAP envelope is:
>> >
>> > <?xml version="1.0" encoding="UTF-8"?>
>> > <soapenv:Envelope
>> >
>>
> xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
>>
>> > xmlns:xsd="http://www.w3.org/2001/XMLSchema"
>> >
>>
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
>> > <soapenv:Body>
>> > <test xmlns="http://DefaultNamespace">
>> > <ns1:year
>> >
>>
> xmlns:ns1="http://webservices.optimalpayments.com/xact">0</ns1:year>
>> > </test>
>> > </soapenv:Body>
>> > </soapenv:Envelope>
>> >
>> > The response was even smaller as the XmlBean
>> return object/document was
>> > empty.
>> >
>> > 3. Using Sun's Fast Infoset:
>> > (slowest of all when not using FI)
>> >
>> > testserver:
>> > [testng] setUp() complete.
>> > [testng] setUp() complete.
>> > [testng] loops:50, in ms:555
>> > [testng] loops:50, in ms:462
>> > [testng] loops:50, in ms:443
>> > [testng] loops:50, in ms:445
>> > [testng] loops:50, in ms:444
>> > [testng] loops:50, in ms:406
>> > [testng] loops:50, in ms:381
>> > [testng] loops:50, in ms:427
>> > [testng] loops:50, in ms:376
>> > [testng] loops:50, in ms:414
>> > [testng] loops:50, in ms:371
>> > [testng] loops:50, in ms:377
>> > [testng] loops:50, in ms:375
>> > [testng] loops:50, in ms:376
>> > [testng] loops:50, in ms:473
>> > [testng] loops:50, in ms:369
>> > [testng] loops:50, in ms:365
>> > [testng] loops:50, in ms:368
>> > [testng] loops:50, in ms:375
>> > [testng] loops:50, in ms:370
>> >
>> > 1. Using woodstox:
>> >
>> > testserver:
>> > [testng] setUp() complete.
>> > [testng] setUp() complete.
>> > [testng] loops:50, in ms:370
>> > [testng] loops:50, in ms:317
>> > [testng] loops:50, in ms:279
>> > [testng] loops:50, in ms:242
>> > [testng] loops:50, in ms:244
>> > [testng] loops:50, in ms:229
>> > [testng] loops:50, in ms:215
>> > [testng] loops:50, in ms:254
>> > [testng] loops:50, in ms:203
>> > [testng] loops:50, in ms:239
>> > [testng] loops:50, in ms:201
>> > [testng] loops:50, in ms:201
>> > [testng] loops:50, in ms:199
>> > [testng] loops:50, in ms:203
>> > [testng] loops:50, in ms:309
>> > [testng] loops:50, in ms:198
>> > [testng] loops:50, in ms:192
>> > [testng] loops:50, in ms:195
>> > [testng] loops:50, in ms:196
>> > [testng] loops:50, in ms:193
>> >
>> > Notice the system stabilizes around 370ms for FI
>> and 200ms for woodstox?
>> >
>> > I'm using FastInfosetPackage_dist_1.0.1.
>> >
>> > I'm curious why the performance is so slow for FI
>> and if there are any
>> > options I can use to speed it up.
>> >
>> > Thanks for any comments.
>> >
>> > Cheers.
>> >
>> >
>> > --
>> > Free replacement for Exchange and Outlook
>> (Contacts and Calendar)
>> > http://www.ScheduleWorld.com/
>> > WAP:
>>
> http://www.ScheduleWorld.com/sw/WAPToday?id=4000&tz=EST
>> > WebDAV:
>> http://www.ScheduleWorld.com/sw/webDAVDir/4000.ics
>> > VFREEBUSY:
>> http://www.ScheduleWorld.com/sw/freebusy/4000.ifb
>> >
>> >
>>
> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail:
>> users-unsubscribe_at_fi.dev.java.net
>> > For additional commands, e-mail:
>> users-help_at_fi.dev.java.net
>> >
>> >
>>
>>
> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> users-unsubscribe_at_fi.dev.java.net
>> For additional commands, e-mail:
>> users-help_at_fi.dev.java.net
>>
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_fi.dev.java.net
> For additional commands, e-mail: users-help_at_fi.dev.java.net
>
>