Paul Sandoz wrote:
> Since the 'literal' output is likely to be the exception i think it
> should be possible to produce the required data on the first 'literal'
> call. However, to do this the FI serializer would require that Name
> objects state whether they are used for EIIs, AIIs, or both.
JAXB assigns indices to all Names, while FI assigns indices to used
ones. So I think you'll have to maintain the index conversion table anyway.
And I think from this index conversion table you can tell which names
are used already when you hit the first 'literal'.
>>> Having a unique integer with a Name would be useful so that efficient
>>> table lookup can be achieved e.g. if there are 10 Names then each Name
>>> would be assigned a unique integer in the interval [0, 9] say.
>>> Although FI has separate tables for AIIs and EIIs i do not think it
>>> makes a difference for JAXB to have once space for Name objects if
>>> O(1) access is possible.
>>
>>
>> Yes, this is on my TODO list. I can easily give them sequence number
>> starting 0.
>>
>
> Great. I think we will need to differentiate between Names associated
> with EIIs, AIIs or both for efficient support.
I can compute if Name is used in EII, AII, or both, and perhaps add
flags on Name, but I'm not sure how they are used.
I thought you'll assign FI indices to Names as you see them during
marshalling. So do you really have use for this flag?
> Will Name objects be unique to a JAXBContext?
Yes. That is,
- A Name object always belongs to a JAXBContext
- Two Names that belong to the same JAXBContext are always different
>>> If it can be ascertained that a Name will always use the same prefix
>>> then look of a Name to an index in an FI document is very efficient.
>>
>>
>> Unfortunately this assumption does not hold. One example is when JAXB
>> starts by xmlns="foo", and then later found out that it needs to print
>> out xmlns="" (because you can't assign URI "" to any prefix.)
>>
>
> Can you explain a bit more about this? Not sure i understand fully. I
> thought in this case different Name objects will be used because of
> different URIs.
The question was whether JAXB can guarantee that a given Name object can
be guaranteed to use the same prefix throughout the marshalling. But it
doesn't. Taking your QName-in-content example...
Consider the following example.
class Foo {
@XmlValue
QName n;
}
@XmlRootElement(ns="foo",name="foo")
class Bar {
Foo foo;
}
And imagine
Foo f = new Foo();
f.n = new QName("","zot");
Bar b = new Bar();
b.foo = f;
JAXB marshals it as:
<foo xmlns="foo">
<ns1:foo xmlns="" xmlns:ns1="foo">zot</ns1:foo>
</foo>
So the Name object for {foo}foo uses two different namespace prefixes.
> But i hope for the most common cases this will not occur i.e. with JAX-RPC.
I agree.
> When i first looked at XML namespaces i thought "nice solution" but the
> more i think about it now the less i like it, but i cannot think of any
> better textually compact alternatives (unless of course one uses a
> binary serialization :-) ).
If only they'd let us bind a prefix to the default "" namespace...
And oh so evil QName in content...
>> Another example is when the portion of a document is bound to DOM. DOM
>> can declare namespaces in any way it wants.
>>
>
> In that case Name objects will not be used and the 'literal' approach is
> used instead? so we would fall back to the slower solution (and ensure
> consistency).
That's true.
>> In normal case, one namespace URI is bound to one prefix. So hopefully
>> we can code FI such that it works fast as long as this assumption holds,
>> and if it breaks, it takes the slow route.
>>
>
> Exactly my thoughts too. The question is how can one know when this
> assumption holds? Perhaps the solution is to have an array of arrays.
>
> int[] items = nameIndexToFINameIndex[nameIndex];
> if (items[0] == prefixIndex) {
> int fiNameIndex = items[1];
> } else {
> // search through item for required prefix
> }
Something like that.
If the prefixIndex isn't the "typical" index (if your test in the "if"
statement fails), I assumed that you can look that up from the name
table you wrote.
> It should also be possible to retain arrays and not have to clear them
> for every serialization if a per serialize counter is used. When the
> counter reaches the maximum integer value the array is reset.
Right.
>>> It would help if it can be known if advance if serialization may use
>>> the non Name methods. If this is the case then it is not necessary to
>>> maintain synchronized hash tables for the different forms of data.
>>
>> This isn't really possible without traversing the whole object tree
>> beforehand.
>
> I suppose there could be a hint by analysing at the static definition.
> However, as i said early, i think it may be possible to avoid this using
> a lazy calculation approach if the Name object has the required
> information on where it is used.
I think corner cases make it somewhat difficult for JAXB to say "this
JAXBContext will never ever use the non-Name version.
For example, with any JAXBContext, currently the user is allowed to
create an instance of JAXBElement with arbitrary QName and marshal it.
>> Sometimes the marshaller knows the element being written will not have
>> any attribute nor namespace declaration. I can add a boolean parameter
>> to beginStartTag to indicate this.
>>
>> If the hint says "absolutely no attribute/xmlns", you can skip buffering.
>>
>> Another possibility is to define the "writeLeafElement", which writes
>> something like
>>
>> <foo>xxxx</foo>
>>
>> I found that this happens very often in many schemas, and this might
>> allow better optimizations for FI.
>>
>
> I think both these optimizations would be useful for FI and XML
> serialization as long as the higher layer does not have to do a load of
> work.
Another pressure for us to make the runtime smaller. So picking the
right optimization is tricky. I think I'm inclined to do the leaf
optimization.
> For XML it means that there can be UTF-8 encoded strings "<foo>" and
> "</foo>" that could also be reused for when there are attributes present
> i.e. the former could write up to the last but one octet. For FI we can
> still use the first UTF-8 encoded string by ignoring the first and last
> octet.
>
> Reducing the number of OutputStream.write can speed things up as some of
> the implementations e.g. BufferedOutputStream and ByteArrayOutputStream
> use synchronized methods.
8-O
I didn't know that. Given that the OutputStream uses the decorator
pattern excessively, why they combined the synchronization with
buffering into one class is really beyond me!
>>> I wonder if it is possible to reduce the cost of checking for
>>> namespaces on each element and push this out to the higher layer
>>> which may better determine how namespaces are used? For example the
>>> common case with JAX-RPC would be to define all required namespaces up
>>> front on the SOAP envelope or on the root element fragment.
>>>
>>> Maybe another method:
>>>
>>> beginStartTagWithNamespaceDeclarations
>>>
>>> would be appropriate?
>>
>>
>> NamespaceContextImpl keeps track of what namespaces need to be declared
>> when. The only thing XmlOutput needs to do is to check the current
>> NamespaceContext.Element and declare new elements.
>>
>> Today you can check if an element has any namespace declaration or not by:
>>
>> if(nsContext.getCurrent().count()==0)
>>
>> and I think this is cheap enough. If you think this is too expensive, I
>> can pass in the value of nsContext.getCurrent(). But it just saves one
>> memory look up --- given the access frequency, chances are, that this
>> memory is in a processor cache.
>>
>>
>
> OK, if the previous optimizations you proposed for attribute hints and
> leaf elements are possible then i think it covers a lot of cases
> efficiently already.
OK.
--
Kohsuke Kawaguchi
Sun Microsystems kohsuke.kawaguchi_at_sun.com