Re: JAXB Hook for FI

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Thu, 12 May 2005 10:11:02 +0200

Kohsuke Kawaguchi wrote:
> Paul Sandoz wrote:
>
>>> - Do you parse the qname string to obtain the prefix? If so i can pass
>>>
>>>>> the prefix as a separate string as this will be more efficient for
>>>>> JAXB and FI.
>
> >
>
>> Are you talking about qnames in content?
>>
>> I was referring to the methods:
>>
>> startElement(String nsUri, String localName, String qname,
>> Attributes atts)
>>
>> endElement(String nsUri, String localName, String qname)
>>
>> and the 'qname' parameter.
>
>
> Ah. No, we never look at the prefix.

Good :-)

> The only use for this QName is for
> building DOM. Passing this parameter is usually cheap for SAX, but maybe
> it might be costly for FI.
>

It is computed once for first occurrence of a {prefix, uri, local name},
so for large documents with repeating elements it is not a big deal.
Still for small documents it would be good not to have to create it and
intern it.

> It would be nice if we can pass in this information only when it's
> necessary, maybe that's the kind of situation where the pull
> unmarshaller performs better.
>

Yes, the FI StAX parser specifically avoids the creation of qname
strings because they are not needed. And i think the XML StAX impl does
not need to check the qname string in the symbol table to obtain the
interned string.

>
>> i presume in areas where there could be Base64Data because of an
>> annotation.
>>
>> I thought it would be more efficient to not have to go through the
>> specific data type classes and fields could be set directly if the
>> algorithm data corresponds to the Java type.
>
>
> You can reuse the instance of those typed CharSequence, so when the
> expectation and the actual data matches up, the cost is just setting to
> this wrapper and getting from the wrapper, which I hope shouldn't be too
> bad.
>
> When the expectation and the actual data didn't match up, being able to
> treat them all as CharSequence always help.
>

I agree. This is actually quite important because an algorithm could be
used where the XSD datatype does not match the algorithm data thus
characters will have to be returned.

>
>> So when there is a text event the parser can call expectText and if it
>> is false check if the characters are white space and if so skip.
>
>
> You don't even need to check the characters==whitespace if you don't
> want to. We can silently ignore any misplaced text.
>

OK. Something to consider later on perhaps.

>
>> I am wondering if it would be efficient to have a method that combines
>> an element event with a text event. Since for a lot of binding cases
>> this type of patter will occur:
>>
>> <e>foo</e>
>> <e>bar</e>
>> <e>baz</e>
>
>
> This is a possibility that we should consider. On the first look,
> however, if we ask the parser to recognize this pattern, that might be
> costly enough to cancel any benefit.
>
> For example, in SAX, to do this you need to hold off two events at
> least, plus buffer copy, and you also need to copy an attribute for <e>
> in case what you eventually see is <e>foo<e>...
>

I see, yes the costs might outweigh the benefits.

Paul.

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109