dev@fi.java.net

Re: String interning and symbol table

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Thu, 06 Jan 2005 14:14:32 +0100

Kohsuke Kawaguchi wrote:
>>Note that an FI parser implementation will always return the same
>>instance of an identifying string (if implemented sensibly, difficult
>>to avoid this!) it is just not interned with respect to the JVM.
>
>
> .NET exposes a symbol table for applications. Those tables can be
> pre-filled or shared across multiple parser instances, and presumably
> all the strings are interned against this table.
>

OK.


> From an application's perspective, a parser that always return the same
> string instance isn't too useful. Most of the times, we want to compare
> the names against constant names we hard-coded in our program, and doing
> that efficiently requires all the strings to be String.intern-ed.
>

OK.

I guess a symbol table could be generated from a Schema (strings are
interned) and then Strings from the table could be checked for reference
equality with the Strings returned from the parser. Thus a parser impl
could use a String.intern feature or the table (and perhaps the latter
would be faster to search as the space would be smaller?)

However, i can see how it would be easier for the JAXB RI to use
directly hardcoded strings, plus there is no standardization of a Symbol
table that can passed between the JAXB RI and a parser. I wonder if this
would be a useful extension to JAXP???


> This symbol table thing aside, it would really help (at least JAXB RI)
> if the FI parser has a mode to intern strings. It can do it more
> efficiently than applications do.
>

Yes, i plan to do this. There is partial support for this already. The
SAX feature:

"http://xml.org/sax/features/string-interning"

is supported in terms of setting/getting but we need to update the code
to intern when this feature is set (for decoding a literal identifying
string, non-identifying strings, such as CIIs and attribute values, will
not be interned for obvious reasons).

I presume this same feature (although a different URI) is also used for
StAX??? The StAX stuff is less cooked in this area at the moment.


In general i think there are some very useful discussions we could have
around FI and JAXB.

I would especially like to explore how it might be possible to speed up
the process of serializing qualified names.

Currently the serializer needs to obtain an index for the tuple of
prefix, namespace name and local name. Since JAXB has knowledge of these
tuples i am wondering if it might be possible to speed up the process of
obtaining an index.

Paul.

-- 
| ? + ? = To question
----------------\
   Paul Sandoz
        x38109
+33-4-76188109
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_fi.dev.java.net
For additional commands, e-mail: dev-help_at_fi.dev.java.net