Re: Realistic generation and use of external vocabularies

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Thu, 13 Apr 2006 13:48:51 +0200

Alan Hudson wrote:
> Paul Sandoz wrote:
>
>> Hi,
>>
>> Some people on this list may be interested in this blog i just wrote:
>>
>> http://blogs.sun.com/roller/page/sandoz?entry=realistic_generation_and_use_of
>>
>
> Very nice work.
>
> I think a next stage will be to assign encoders based on the schema
> type. I expect just using the builtin ones like ints/floats will save a
> fair bit of space.
>

Yes, very good point.

It should be possible to extend the SchemaProcessor to have a methods to
obtain two Map<QName, QName> for elements and attributes that maps an
element/attribute to a QName of a simple data type or a list of simple
data type (so it is easy to support arrays).

I did have a quick look at the XSOM API to determine if an
element/attribute declaration corresponds to a simple type or a list of
simple type, but it is not obvious because some schema can have one or
more levels of indirection.

> Even better compression(but loss of encoding time) would be to find the
> smallest type available to represent the data. Ie 0.0 could be
> represented as a byte. But I'm not positive of the InfoSet requirements
> for this. To me if the schema calls that a float and I return 0 its ok.
> But the string based InfoSet thinks 0.0 is not the sames as 0.
>

Right. It depends whether preservation of characters is required.
Looking at the canonical representation of xsd:float. Note that there is
no guarantee when using XML binding tools that characters will be
preserved when converting internal representations to/from lexical values.

Paul.

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109