Kohsuke Kawaguchi wrote:
>
>
> Paul Sandoz wrote:
>
>>I guess a symbol table could be generated from a Schema (strings are
>>interned) and then Strings from the table could be checked for reference
>>equality with the Strings returned from the parser. Thus a parser impl
>>could use a String.intern feature or the table (and perhaps the latter
>>would be faster to search as the space would be smaller?)
>
>
> Microsoft people were saying in xml-dev that their XSLT processor
> pre-fill the table with XSLT tokens before they start parsing stylesheets.
>
> I've heard that Xerces used to have a separate symbol table that's not
> backed by String.intern. They called the process of passing a string and
> getting back a singleton instance as "symbolizing". But they eventually
> abandoned it in favor of a table backed by String.intern.
>
OK.
> It's just too convenient to be able to do if(localName=="foo") instead
> of symbolizing foo first when you first get a symbol table, then keeping
> it somewhere, and then write if(localName=fSymbolFoo).
>
Agreed.
> It's not just more code, but you also have to keep those symbols as
> fields, leading to increased memory footprint. I believe those are the
> reasons why Xerces moved to String.intern.
>
> Note that they still use a symbol table and maintain a look up table so
> that you can get to an interned String from (char[],int,int).
>
Yes, to reduce the scope of the seach if strings have already been interned.
>
>>However, i can see how it would be easier for the JAXB RI to use
>>directly hardcoded strings, plus there is no standardization of a Symbol
>>table that can passed between the JAXB RI and a parser. I wonder if this
>>would be a useful extension to JAXP???
>
>
> For the above reasons, I think the consensus in parser writers is that
> String.intern is good enough. Really the only missing piece is
>
> class String {
> static String intern(char[],int,int);
> }
>
> And I found a relevant RFE [1] that exactly says this. It was marked as
> P4 and fell through the radar, but I bumped it up to P3 yesterday and
> something might happen.
>
Interesting, that functionality would be very useful.
>>In general i think there are some very useful discussions we could have
>>around FI and JAXB.
>
>
> Agreed.
>
>
>>I would especially like to explore how it might be possible to speed up
>>the process of serializing qualified names.
>>
>>Currently the serializer needs to obtain an index for the tuple of
>>prefix, namespace name and local name. Since JAXB has knowledge of these
>>tuples i am wondering if it might be possible to speed up the process of
>>obtaining an index.
>
>
> Let's see. JAXB can probably symbolize or intern prefix, namespace URIs,
> and local names. We can also tell you when prefix->namespace binding
> begins and ends, but I guess that's something any SAX event generator
> can tell you.
>
> JAXB doesn't maintain the 3-tuple, and I don't know if doing that inside
> the JAXB RI is any easier/faster than doing it inside the FI writer.
>
If a unique object (could be static) could be associated with each
3-tuple then it is possible to use a hash table with this object to
return an index. Currently the impl is using a hash table on the local
name (for Sax it could use the <prefix>:<localName> which is more
efficient) and then check that the prefix and namespace names match
(checking references first before performing String.equals.
This may however require that an extention to SAX or StAX be used that
is specific to FI.
Such objects may also be useful for an external vocabulary feature. An
index could be part of this object which defined the external vocabulary
index for the 3-tuple.
Paul.
--
| ? + ? = To question
----------------\
Paul Sandoz
x38109
+33-4-76188109
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_fi.dev.java.net
For additional commands, e-mail: dev-help_at_fi.dev.java.net