Paul Sandoz wrote:
> I guess a symbol table could be generated from a Schema (strings are
> interned) and then Strings from the table could be checked for reference
> equality with the Strings returned from the parser. Thus a parser impl
> could use a String.intern feature or the table (and perhaps the latter
> would be faster to search as the space would be smaller?)
Microsoft people were saying in xml-dev that their XSLT processor
pre-fill the table with XSLT tokens before they start parsing stylesheets.
I've heard that Xerces used to have a separate symbol table that's not
backed by String.intern. They called the process of passing a string and
getting back a singleton instance as "symbolizing". But they eventually
abandoned it in favor of a table backed by String.intern.
It's just too convenient to be able to do if(localName=="foo") instead
of symbolizing foo first when you first get a symbol table, then keeping
it somewhere, and then write if(localName=fSymbolFoo).
It's not just more code, but you also have to keep those symbols as
fields, leading to increased memory footprint. I believe those are the
reasons why Xerces moved to String.intern.
Note that they still use a symbol table and maintain a look up table so
that you can get to an interned String from (char[],int,int).
> However, i can see how it would be easier for the JAXB RI to use
> directly hardcoded strings, plus there is no standardization of a Symbol
> table that can passed between the JAXB RI and a parser. I wonder if this
> would be a useful extension to JAXP???
For the above reasons, I think the consensus in parser writers is that
String.intern is good enough. Really the only missing piece is
class String {
static String intern(char[],int,int);
}
And I found a relevant RFE [1] that exactly says this. It was marked as
P4 and fell through the radar, but I bumped it up to P3 yesterday and
something might happen.
> Yes, i plan to do this. There is partial support for this already. The
> SAX feature:
>
> "http://xml.org/sax/features/string-interning"
>
> is supported in terms of setting/getting but we need to update the code
> to intern when this feature is set (for decoding a literal identifying
> string, non-identifying strings, such as CIIs and attribute values, will
> not be interned for obvious reasons).
>
> I presume this same feature (although a different URI) is also used for
> StAX??? The StAX stuff is less cooked in this area at the moment.
I agree that this should be available. But looking at javadoc,
apparently it isn't. I agree that StAX is less cooked in this area.
> In general i think there are some very useful discussions we could have
> around FI and JAXB.
Agreed.
> I would especially like to explore how it might be possible to speed up
> the process of serializing qualified names.
>
> Currently the serializer needs to obtain an index for the tuple of
> prefix, namespace name and local name. Since JAXB has knowledge of these
> tuples i am wondering if it might be possible to speed up the process of
> obtaining an index.
Let's see. JAXB can probably symbolize or intern prefix, namespace URIs,
and local names. We can also tell you when prefix->namespace binding
begins and ends, but I guess that's something any SAX event generator
can tell you.
JAXB doesn't maintain the 3-tuple, and I don't know if doing that inside
the JAXB RI is any easier/faster than doing it inside the FI writer.
[1]
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4463978
--
Kohsuke Kawaguchi
Sun Microsystems kohsuke.kawaguchi_at_sun.com
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_fi.dev.java.net
For additional commands, e-mail: dev-help_at_fi.dev.java.net