During the weekend, I compiled JXDM (the biggest schema we have in our
unit test --- 8MB) and profiled that session.
Optimizing the compiler isn't our priority, but still I found some
interesting data, so I thought I should share it (in case somebody is
interested in working on it!)
For this test, I added the -nv option, because there's just no hope we
can optimize that part of Xerces.
- JCodeModel.build takes about 25% of the whole processing. Since code
generated from XJC is heavily commented, the 1/4 of the total building
time is spent JDocComment.generate.
- more than half the JCOdeModel.build time is eventually spent on
JFormatter.p(String) and UnicodeEscapeWriter. Sometimes we know that
tokens don't need to be escaped, meaning that all chars are in US ASCII,
for example when printing keywords and numbers. Also, often we print the
same token multiple times (such as type names and variable names). These
would potentially allow us to skip the Unicode escape check, and making
things faster.
- Model.generateCode just takes 7% of the total processing. This phase
is fast.
- XSOM->Model conversion is also fast and it takes just 9% of the total
processing.
- parsing schemas into XSOM take time 58% of the total processing time.
I saw that the significant portion of this is actually spent on
validating annotations&bindings by JAXP Validator.
They are all chained in a callback, so it's hard to measure exactly how
much is spent on validation. But at least 15% of the total processing (=
at least more than 30% of the XSOM building time) is spent in the JAXP
Validator.
Today, JAXP Validator is used to validate every <xs:annotation> section,
even though we are really only interested in validating <jaxb:XXXX>s.
Since this JXDM schema doesn't use any JAXB customization, this much
time is spent on validating meaningless things.
Perhaps we can apply JAXP Validator more selectively/smartly only to the
portion that needs to be validated.
It will be also interesting to see how much improvement we'll get if we
replace annotation/customization parsing by using JAXB. XML Schema is
hard to parse by JAXB, but customizations are relatively easy.
It will be also a good exercise to see if JAXB is really usable or not.
--
Kohsuke Kawaguchi
Sun Microsystems kohsuke.kawaguchi_at_sun.com