users@jaxb.java.net

Re: "Easy" unmarshalling an JAXB1

From: Jorge M. <ccsdev_at_gmail.com>
Date: Sun, 26 Feb 2006 00:19:09 -0500

Hi Aleksei,

Wasn't it enough to relax your schema -to allow the non-so correct xml
documents and generate the classes again?

Or to attach a ValidationEventHandler to the unmarshaller ?

I turn off validation and use a ValidationEventHandler to avoid
stopping the unmarshalling when a document contains new attributes or
duplicate elements when I only expect one.

-Jorge

On 2/25/06, Aleksei Valikov <valikov_at_gmx.net> wrote:
> Hi.
>
> I have recently met a need to unmarshall incomplete XML data with JAXB
> 1. The situation is as follows. We have developed a relatively large
> XML-based metadata management system on the basis of JAXB 1. Now wee
> need to import the existing data - and it appears that 80% of documents
> are "a bit" invalid. That is, sometimes few elements or attributes are
> missing. At the same time, documents are structurally "almoust" correct.
>
> What we needed was a way to import invalid data - as much as it is
> possible. With JAXB 1. I have searched the web and found that this issue
> is addressed in JAXB 2, and there's no solution for JAXB 1.
>
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5023635
> http://www.thescripts.com/forum/threadnav84719-1-10.html
>
> This is actually pretty bad for us, since I can't go out there and tell
> my customers "sorry, guys, your 10000+ documents are invalid, we can't
> import them".
>
> Again many thanks to Sun for making JAXB RI open-source. I've digged a
> bit and here are my results.
>
> JAXB RI builds unmarshallers on the basis of
> com.sun.tools.xjc.generator.unmarshaller.automaton.Automaton. Automaton
> is produced by
> com.sun.tools.xjc.generator.unmarshaller.AutomatonBuilder, which
> examines the expression tree and creates a structure of States
> (com.sun.tools.xjc.generator.unmarshaller.automaton.State).
>
> In case of non-mandatory constructs, generated states have so-called
> "delegated states". As far as I understood, if a state is for some
> reason not processed, then processing switches to the delegated state.
> State with delegation is generated, for instance for optional elements
> which are actually represented by "choice(element, epsilon)" structures.
> In this case, expression is epsilon-reducible so if the element does not
> appear, automaton will switch to the delegated state.
>
> So to allow "easy" unmarshalling, I actually needed to assign delegated
> states even in case of non-epsilon-reducible expressions, sequences, and
> so on.
>
> I've tried changing the code of onSequence, onChoice and _onRepeated
> methods:
>
> In the sequence, always delegate to the next state:
>
> public Object onSequence( SequenceExp exp ) {
> Expression[] children = exp.getChildren();
>
> State currentTail;
>
> for( int i=children.length-1; i>=0; i-- )
> {
> currentTail = tail;
> tail = (State)children[i].visit(this);
> tail.setDelegatedState(currentTail);
> }
>
> return tail;
> }
>
> In choice, turn on delegation even if expression is not epsilon-reducible:
>
> public Object onChoice( ChoiceExp exp ) {
> Expression[] children = exp.getChildren();
>
> State currentTail = tail;
> State head = new State();
>
> for( int i=children.length-1; i>=0; i-- ) {
> tail = currentTail;
> State localHead = (State)children[i].visit(this);
> if( localHead==currentTail )
> continue; // use delegation to produce a smaller
> state machine
> head.absorb( localHead );
> }
>
> //lexi if( exp.isEpsilonReducible() ) {
> // optimization
> if( head.hasTransition() )
> head.setDelegatedState(currentTail);
> else
> head = currentTail;
> //lexi }
>
> return head;
> }
>
> In repeated expressions, act as if zero was always allowed:
>
> private State _onRepeated( Expression itemExp, boolean
> isZeroAllowed ) {
> State _tail = tail;
> State newHead = (State)itemExp.visit(this);
>
> _tail.absorb(newHead);
> // return isZeroAllowed?_tail:newHead;
> return _tail;
> }
>
> Now, with classes generated with this code, I can unmarshall even
> invalid XML.
>
> Well, I understand that it's quite a hacking approach, but it had worked
> for me. I'd like to ask JAXB developers, what you guys think of it and
> is there any chance to get these corrections into the official code. Of
> course, not in the default mode, but if I turn on something like
> noValidatingUnmarshaller, JAXb could generate an "easy" one.
>
> Bye.
> /lexi
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_jaxb.dev.java.net
> For additional commands, e-mail: users-help_at_jaxb.dev.java.net
>
>