users@jaxb.java.net

Re: "Easy" unmarshalling an JAXB1

From: Kohsuke Kawaguchi <Kohsuke.Kawaguchi_at_Sun.COM>
Date: Tue, 28 Feb 2006 10:35:36 -0800

Aleksei Valikov wrote:
> What we needed was a way to import invalid data - as much as it is
> possible. With JAXB 1. I have searched the web and found that this issue
> is addressed in JAXB 2, and there's no solution for JAXB 1.
>
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5023635
> http://www.thescripts.com/forum/threadnav84719-1-10.html

Correct.

> This is actually pretty bad for us, since I can't go out there and tell
> my customers "sorry, guys, your 10000+ documents are invalid, we can't
> import them".

Right.

> Again many thanks to Sun for making JAXB RI open-source. I've digged a
> bit and here are my results.
>
> JAXB RI builds unmarshallers on the basis of
> com.sun.tools.xjc.generator.unmarshaller.automaton.Automaton. Automaton
> is produced by
> com.sun.tools.xjc.generator.unmarshaller.AutomatonBuilder, which
> examines the expression tree and creates a structure of States
> (com.sun.tools.xjc.generator.unmarshaller.automaton.State).

Yes.

> In case of non-mandatory constructs, generated states have so-called
> "delegated states". As far as I understood, if a state is for some
> reason not processed, then processing switches to the delegated state.
> State with delegation is generated, for instance for optional elements
> which are actually represented by "choice(element, epsilon)" structures.
> In this case, expression is epsilon-reducible so if the element does not
> appear, automaton will switch to the delegated state.

Yes. The delegated state is really just a way to compress an automaton.
It's DFA with one extension --- that there can be one epsilon
transition, which is the delegated state.

> So to allow "easy" unmarshalling, I actually needed to assign delegated
> states even in case of non-epsilon-reducible expressions, sequences, and
> so on.

Ah, OK, so your case is where your schema added new required things that
didn't exist in your first version?

I believe JAXB 1 has a problem on the other direction as well --- if
your document has stuff that the automaton doesn't recognize, then it
will try the delegated state until it hits the dead end, and by then you
effectively consumed all the state machine and therefore won't be able
to match anything afterward.


> Well, I understand that it's quite a hacking approach, but it had worked
> for me. I'd like to ask JAXB developers, what you guys think of it and
> is there any chance to get these corrections into the official code. Of
> course, not in the default mode, but if I turn on something like
> noValidatingUnmarshaller, JAXb could generate an "easy" one.

I think it's good to add such a mode.


-- 
Kohsuke Kawaguchi
Sun Microsystems                   kohsuke.kawaguchi_at_sun.com