users@jaxb.java.net

Re: Debugable source code needed for version 2.1.8 of Jaxb

From: Wolfgang Laun <wolfgang.laun_at_gmail.com>
Date: Tue, 3 Mar 2009 18:10:02 +0100

The pdLinux,xml is damaged. There it is, at offset 6257:

 <property>
<name>eft_allow_manual_tr����F9J��QJu�# �
<value>true</value> </property>

Since the property name is certainly (cf. the other file)
eft_allow_manual_transaction
and the </name> tag is completely missing, something has damaged this file.
The funny looking characters result from identical byte triplets 0xef, 0xbf,
0xbd.
This is the UTF-8 representation of Unicode code point U+FFFD, called
REPLACEMENT CHARACTER. Its pupose is to act as a surrogate character
whenever
some text processor or parser cannot determine what the input should
represent.
This ties with the fact that the number of characters is correct, as
compared with
an identical property element from the other file.

This is the text as it appears in the other file (with LFs added by me).
                <property>
                    <name>eft_allow_manual_transaction</name>
                    <value>true</value>
                </property>

I see no reason for any parser to run amok at this point, and there's
nothing here
that should cause problems due to UTF-8 en- or decoding. Could it be that
this
is a data transmission problem? Maybe this "decryption" has a flaw in it?

Both files would (without the damaged section) be in US-ASCII encoding since
they don't use any codepoint greater U+007F. This is, of course also correct
UTF-8 encoding, although no multibyte sequence actually occurs.

-W




2009/3/3 Jose Correia <correij_at_gmail.com>

> Hi Wolfgang
>
> Well find attached two .xml files, the one was when it was loaded on
> windows after being decrypted and the other on Linux, just before they are
> both unmarshalled.
>
> Curiously if I open both files with UltraEdit32, it says encoding is
> U8-DOS. If I try and compare both with WinMerge it tells me that "Files use
> different encodings, left = 1252 (the Windows one) and right = UTF-8 (Linux
> one), and merging may lead to information loss"
>
> and then "Information lost due to encoding errors: right file.
>
> We then ran a unix command called cmp and the attached file shows the
> places where they are different. I'm not sure which are the funny characters
> it fails on but I created a simpler xml file and it worked without the need
> for setting the encoding. I thought something like these on the one that
> fails had something to do with it: "ENTER&lt;lf&gt; PIN&lt;lf&gt;
> #####"
>
> Anyway go figure.
>
> Regards
> Jose
>
>
> On Tue, Mar 3, 2009 at 11:30 AM, Wolfgang Laun <wolfgang.laun_at_gmail.com>wrote:
>
>> Glad to hear that it works. Nevertheless, what you report is very strange.
>>
>> Whatever OS configuration and marshaller property setting result in - the
>> XML file should be written in the encoding shown in the first line
>> <?xml ... encoding="UTF-8" ... ?>
>> and the very same file, or its sequence of bytes, should be readable by
>> an unmarshaller on some other system.
>>
>> Problems may be caused if some other program that does not interpret
>> <?xml...?> is used to handle the data.
>>
>> I would be very interested to learn
>> - what was the first XML line when written on Windows
>> - what, exactly, were the "funny characters" (the raw byte sequence) as
>> written on Windows
>>
>> -W
>>
>>
>> On Tue, Mar 3, 2009 at 10:00 AM, Jose Correia <correij_at_gmail.com> wrote:
>>
>>> Hi all
>>>
>>> I got it to work by setting explicitly on my marshaler the following
>>> property to UTF-8:
>>>
>>> Marshaller m = jc.createMarshaller();
>>> m.setProperty(Marshaller.JAXB_ENCODING,
>>> ArtifactConstants.UTF8_ENCODING);
>>>
>>> So when I saved my xml data on windows and then loaded it on Linux, it
>>> now works. Even though the javadocs specify that if no encoding is specified
>>> then it defaults to UTF-8, my chief engineer suspected it was setting a OS
>>> specific encoding thus making it not work...
>>>
>>> I also found the problem only occured if the xml data had some funny
>>> characters in it, Im guessing characters that the Linux encoding didn't
>>> understand.
>>>
>>> Cheers
>>> Jose
>>>
>>>
>>> On Fri, Feb 27, 2009 at 4:58 PM, Jose Correia <correij_at_gmail.com> wrote:
>>>
>>>> Well I did check out the build environment and saw that in
>>>> build.properties the debug flag is set to true.... so not sure why it can't
>>>> see the lines. Not sure if having eclipse running on debug on windows
>>>> connecting to the linux box has anything to do with it.
>>>>
>>>> Regards
>>>> Jose
>>>>
>>>>
>>>> On Fri, Feb 27, 2009 at 4:42 PM, Wolfgang Laun <wolfgang.laun_at_gmail.com
>>>> > wrote:
>>>>
>>>>> Off-list, Jose has confirmed by surmise that the exception occurs
>>>>> during unmarshalling and is, most likely, due to some mishap in connection
>>>>> with the transfer of the XML file between systems.
>>>>>
>>>>> This doesn't clarify the no-debug-flag question for 2.1.8, though.
>>>>>
>>>>> -W
>>>>>
>>>>> On Tue, Feb 24, 2009 at 3:28 PM, Jose Correia <correij_at_gmail.com>wrote:
>>>>>
>>>>>> Hi all
>>>>>>
>>>>>> Our software is using jaxb-api.jar and jaxb-impl.jar for version
>>>>>> 2.1.8. We decided to try our software on linux to see how it would fare (as
>>>>>> opposed to Windows XP/2000/2003).
>>>>>>
>>>>>> We are trying it with Ubuntu 8.0.4 Desktop version and we are using
>>>>>> Sun's jdk version 6 update 12.
>>>>>>
>>>>>> The line that crashes is:
>>>>>>
>>>>>> Unmarshaller u = jc.createUnmarshaller();
>>>>>>
>>>>>> where jc is: jc = JAXBContext.newInstance(JAXB_CONTEXT);
>>>>>>
>>>>>> Exception it gives is:
>>>>>>
>>>>>> javax.xml.bind.UnmarshalException
>>>>>> - with linked exception:
>>>>>> [com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
>>>>>> Invalid byte 2 of 2-byte UTF-8 sequence.]
>>>>>>
>>>>>> So I was trying to debug the jaxb code, I got source from downloading
>>>>>> ri 2.1.8 which sources for relevant jars but it seems classes were compiled
>>>>>> without "allow debugging" set to true, because if I put it on debug on
>>>>>> eclipse and I have ensured eclipse knows about the source of those jars, it
>>>>>> then doesn't show me the line numbers.
>>>>>>
>>>>>> From past experience that tells me it wasn't compiled with that debug
>>>>>> flag on.
>>>>>>
>>>>>> Anyway if anyone can help with exception or how to get debugabble
>>>>>> classes I would appreciate it. I tried getting into the cvs source with:
>>>>>>
>>>>>> cvs -d:pserver:yourid_at_cvs.dev.java.net:/cvs co -d jaxb-ri
>>>>>> jaxb2-sources/jaxb-ri
>>>>>>
>>>>>> but using my sun id (that I used to subscribe to mailing list) it came
>>>>>> back with unknown id. I have applied to become a code observer within the
>>>>>> https://jaxb2-sources.dev.java.net/ project.
>>>>>>
>>>>>> Thanks
>>>>>> Jose
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_jaxb.dev.java.net
> For additional commands, e-mail: users-help_at_jaxb.dev.java.net
>