users@jaxb.java.net

Content byte count & fragmented parsing

From: Sang Go <sanghgo_at_gmail.com>
Date: Tue, 4 Mar 2008 17:10:01 -0500

I have a few questions regarding this XML document.

This portion of the document, I'll refer to as the HEADER, is sent
when the connection is established
<?xml version="1.0" encoding="UTF-8"?>
<rootelement xmlns="http://..."
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www... file:....xsd">

.. break in transmission, but socket connection is not closed ...

Then many of these types of elements is sent. The length attribute is
the byte count of its entire content, whitespace and all. I'll refer
to this section as the CONTENT.
<message xmlns="" len="???">
  <submessage>
  .... more content containing XML tags ...
  </submessage>
</message>

.. break in transmission, but socket connection is not closed ...

<message xmlns="" len="???">
  <submessage>
  .... more content containing XML tags ...
  </submessage>
</message>

.. break in transmission, but socket connection is not closed ...

.. more <message> XML documents can be transmission ...

Then the following when the connection is closed. I'll refer to this
as the FOOTER.
</rootelement>

QUESTION 1. How do I calculate the byte count for the content of
"message" element in JAXB? I can attach an external listener to get
the unmarshalled object tree, but I need to know the byte count of the
XML characters that the object was unmarshalled from.

QUESTION 2. Similarly, how do I do the reverse, marshalling XML
document then inserting the length.

QUESTION 3 (and more). What is the best way of parsing the HEADER
separately from the messages in the CONTENT, and finally the FOOTER?
Although this long transmission can be considered to be an XML
document, in the strictest sense, I need to depart from strict XML
when parsing/validating it Each message in the CONTENT section needs
to be considered separately for validation independently of the
others, and should a bad message should not cause the parser to reject
following processes. By this, I'm assuming that the error in the
message does not break well-formness but the schema.

For QUESTION 3, I'm thinking about using a SAX parser for the HEADER,
then a JAXB parser to process the messages, then I'm not sure about
what to do about the FOOTER (I can't just ignore it because I have to
check for it). Since the parsers read from a stream (socket), how
would you hand off the stream to a JAXB parser without losing any
fragments of the next XML message in SAX parser buffers? The JAXB
parser would get "</rootelement>" and would throw this out because it
would not be aware of that the opening <rootelement> was already sent.
 How do I handle this case?

FYI - An XML schema was used to implement a communications protocol,
and that is why there is this complication with the length and the
HEADER, CONTENT, FOOTER protocol.

Thanks in advance.

-- 
Sang Go.