dev@fi.java.net

Re: Generation of external vocabularies

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Fri, 19 Aug 2005 16:10:43 +0200

Kohsuke Kawaguchi wrote:
> Paul Sandoz wrote:
>
>> Hi Kohsuke,
>>
>> Using XSOM how easy is it to get all element definitions and attribute
>> definitions for a schema?
>
>
> If it's just global element decls, that's very very easy.
>
> XSSchemaSet schemas = ...;
> schemas.iterateElementDecls();
>
> If you really mean all element definitions including local ones, it
> involves in traversing the schema, so you need to implement XSVisitor to
> walk through the tree. But I can probably write one in 10 minutes.
>

Good it is easy then :-) I suppose JAXB does something similar to
produce a list of UTF-8 encoded local names for elements and attributes.


>
>
>> I want to try and generate an external vocabulary from a schema and
>> set of sample documents. The sample documents will be used to generate
>> indexes in proportion to the frequency of information specified in the
>> schema.
>
>
> Matching up elements and attributes in instance documents to schema
> definitions are definitely harder, though.
>
>

It would not be necessary to do exact matches based on validation.

All that would be necessary to do is given a set of qualified names (
{namespace}localName ) in the schema count how many occurences of those
qualified names occur in a set of n documents.

Once everything has been counted sort the set of qualified names
according to the number of occurences, the qualified name with highest
number of occurences being first.

Then assign an index to each qualified name whose value is position of
the qualified name in the sorted set.

Paul.

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109