users@saaj.java.net

SAAJ and UTF-8 BOMs

From: Arjen Poutsma <arjen.poutsma_at_springsource.com>
Date: Tue, 15 Jul 2008 14:28:47 +0200

Hi,

I have noticed that the SAAJ RI doesn't cope with UTF-8 BOMs.
According to http://unicode.org/faq/utf_bom.html#22, an UTF-8 file can
have an optional EF BB BF byte order mark at the beginning of the
file. When such a file is passed to the SAAJ MessageFactory, the
following exception occurs:

com.sun.xml.internal.messaging.saaj.SOAPExceptionImpl: XML declaration
parsing failed
        at
com
.sun
.xml
.internal
.messaging.saaj.soap.SOAPPartImpl.lookForXmlDecl(SOAPPartImpl.java:644)
        at
com
.sun
.xml
.internal
.messaging
.saaj
.soap
.ver1_1.SOAPPart1_1Impl.createEnvelopeFromSource(SOAPPart1_1Impl.java:
68)
        at
com
.sun
.xml
.internal
.messaging.saaj.soap.SOAPPartImpl.getEnvelope(SOAPPartImpl.java:125)
        at
org
.springframework
.ws
.soap.saaj.Saaj13Implementation.getEnvelope(Saaj13Implementation.java:
169)
        at
org
.springframework
.ws.soap.saaj.SaajSoapMessage.getEnvelope(SaajSoapMessage.java:87)
        ... 20 more
Caused by: java.io.IOException: Unexpected characters before XML
declaration
        at
com
.sun
.xml
.internal
.messaging
.saaj.util.XMLDeclarationParser.parse(XMLDeclarationParser.java:121)
        at
com
.sun
.xml
.internal
.messaging.saaj.soap.SOAPPartImpl.lookForXmlDecl(SOAPPartImpl.java:639)
        ... 24 more

I can work around this issue by creating a PushbackInputStream which
chops of the BOM, but it would be nice to have this issue resolved in
the codebase. Or perhaps I am doing something wrong entirely :).

I have attached a sample SOAP message which has an UTF-8 BOM.





Best regards,

Arjen


---
Arjen Poutsma
Senior Software Engineer, SpringSource
Spring Web Services Lead
E: arjen.poutsma_at_springsource.com
W: www.springsource.com
B: blog.springsource.com/arjen