Re: Debugable source code needed for version 2.1.8 of Jaxb

From: Jose Correia <correij_at_gmail.com>
Date: Tue, 3 Mar 2009 21:27:54 +0200

Thanks for the effort Wolfgang.

Well we thought it was the encryption/decryption so I took it completely way
and it still didn't work.

But then by setting UTF-8 explicitly at marshalling time solved it... go
figure.

Regards
Jose

On Tue, Mar 3, 2009 at 7:10 PM, Wolfgang Laun <wolfgang.laun_at_gmail.com>wrote:

> The pdLinux,xml is damaged. There it is, at offset 6257:
>
> <property> <name>eft_allow_manual_tr��F
> 9J��QJu�# � <value>true</value> </property>
>
> Since the property name is certainly (cf. the other file)
> eft_allow_manual_transaction
> and the </name> tag is completely missing, something has damaged this file.
> The funny looking characters result from identical byte triplets 0xef,
> 0xbf, 0xbd.
> This is the UTF-8 representation of Unicode code point U+FFFD, called
> REPLACEMENT CHARACTER. Its pupose is to act as a surrogate character
> whenever
> some text processor or parser cannot determine what the input should
> represent.
> This ties with the fact that the number of characters is correct, as
> compared with
> an identical property element from the other file.
>
> This is the text as it appears in the other file (with LFs added by me).
> <property>
> <name>eft_allow_manual_transaction</name>
> <value>true</value>
> </property>
>
> I see no reason for any parser to run amok at this point, and there's
> nothing here
> that should cause problems due to UTF-8 en- or decoding. Could it be that
> this
> is a data transmission problem? Maybe this "decryption" has a flaw in it?
>
> Both files would (without the damaged section) be in US-ASCII encoding
> since
> they don't use any codepoint greater U+007F. This is, of course also
> correct
> UTF-8 encoding, although no multibyte sequence actually occurs.
>
> -W
>
>
>
>
> 2009/3/3 Jose Correia <correij_at_gmail.com>
>
>> Hi Wolfgang
>>
>> Well find attached two .xml files, the one was when it was loaded on
>> windows after being decrypted and the other on Linux, just before they are
>> both unmarshalled.
>>
>> Curiously if I open both files with UltraEdit32, it says encoding is
>> U8-DOS. If I try and compare both with WinMerge it tells me that "Files use
>> different encodings, left = 1252 (the Windows one) and right = UTF-8 (Linux
>> one), and merging may lead to information loss"
>>
>> and then "Information lost due to encoding errors: right file.
>>
>> We then ran a unix command called cmp and the attached file shows the
>> places where they are different. I'm not sure which are the funny characters
>> it fails on but I created a simpler xml file and it worked without the need
>> for setting the encoding. I thought something like these on the one that
>> fails had something to do with it: "ENTER<lf> PIN<lf>
>> #####"
>>
>> Anyway go figure.
>>
>> Regards
>> Jose
>>
>>
>> On Tue, Mar 3, 2009 at 11:30 AM, Wolfgang Laun <wolfgang.laun_at_gmail.com>wrote:
>>
>>> Glad to hear that it works. Nevertheless, what you report is very
>>> strange.
>>>
>>> Whatever OS configuration and marshaller property setting result in - the
>>> XML file should be written in the encoding shown in the first line
>>> <?xml ... encoding="UTF-8" ... ?>
>>> and the very same file, or its sequence of bytes, should be readable by
>>> an unmarshaller on some other system.
>>>
>>> Problems may be caused if some other program that does not interpret
>>> <?xml...?> is used to handle the data.
>>>
>>> I would be very interested to learn
>>> - what was the first XML line when written on Windows
>>> - what, exactly, were the "funny characters" (the raw byte sequence) as
>>> written on Windows
>>>
>>> -W
>>>
>>>
>>> On Tue, Mar 3, 2009 at 10:00 AM, Jose Correia <correij_at_gmail.com> wrote:
>>>
>>>> Hi all
>>>>
>>>> I got it to work by setting explicitly on my marshaler the following
>>>> property to UTF-8:
>>>>
>>>> Marshaller m = jc.createMarshaller();
>>>> m.setProperty(Marshaller.JAXB_ENCODING,
>>>> ArtifactConstants.UTF8_ENCODING);
>>>>
>>>> So when I saved my xml data on windows and then loaded it on Linux, it
>>>> now works. Even though the javadocs specify that if no encoding is specified
>>>> then it defaults to UTF-8, my chief engineer suspected it was setting a OS
>>>> specific encoding thus making it not work...
>>>>
>>>> I also found the problem only occured if the xml data had some funny
>>>> characters in it, Im guessing characters that the Linux encoding didn't
>>>> understand.
>>>>
>>>> Cheers
>>>> Jose
>>>>
>>>>
>>>> On Fri, Feb 27, 2009 at 4:58 PM, Jose Correia <correij_at_gmail.com>wrote:
>>>>
>>>>> Well I did check out the build environment and saw that in
>>>>> build.properties the debug flag is set to true.... so not sure why it can't
>>>>> see the lines. Not sure if having eclipse running on debug on windows
>>>>> connecting to the linux box has anything to do with it.
>>>>>
>>>>> Regards
>>>>> Jose
>>>>>
>>>>>
>>>>> On Fri, Feb 27, 2009 at 4:42 PM, Wolfgang Laun <
>>>>> wolfgang.laun_at_gmail.com> wrote:
>>>>>
>>>>>> Off-list, Jose has confirmed by surmise that the exception occurs
>>>>>> during unmarshalling and is, most likely, due to some mishap in connection
>>>>>> with the transfer of the XML file between systems.
>>>>>>
>>>>>> This doesn't clarify the no-debug-flag question for 2.1.8, though.
>>>>>>
>>>>>> -W
>>>>>>
>>>>>> On Tue, Feb 24, 2009 at 3:28 PM, Jose Correia <correij_at_gmail.com>wrote:
>>>>>>
>>>>>>> Hi all
>>>>>>>
>>>>>>> Our software is using jaxb-api.jar and jaxb-impl.jar for version
>>>>>>> 2.1.8. We decided to try our software on linux to see how it would fare (as
>>>>>>> opposed to Windows XP/2000/2003).
>>>>>>>
>>>>>>> We are trying it with Ubuntu 8.0.4 Desktop version and we are using
>>>>>>> Sun's jdk version 6 update 12.
>>>>>>>
>>>>>>> The line that crashes is:
>>>>>>>
>>>>>>> Unmarshaller u = jc.createUnmarshaller();
>>>>>>>
>>>>>>> where jc is: jc = JAXBContext.newInstance(JAXB_CONTEXT);
>>>>>>>
>>>>>>> Exception it gives is:
>>>>>>>
>>>>>>> javax.xml.bind.UnmarshalException
>>>>>>> - with linked exception:
>>>>>>> [com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
>>>>>>> Invalid byte 2 of 2-byte UTF-8 sequence.]
>>>>>>>
>>>>>>> So I was trying to debug the jaxb code, I got source from downloading
>>>>>>> ri 2.1.8 which sources for relevant jars but it seems classes were compiled
>>>>>>> without "allow debugging" set to true, because if I put it on debug on
>>>>>>> eclipse and I have ensured eclipse knows about the source of those jars, it
>>>>>>> then doesn't show me the line numbers.
>>>>>>>
>>>>>>> From past experience that tells me it wasn't compiled with that debug
>>>>>>> flag on.
>>>>>>>
>>>>>>> Anyway if anyone can help with exception or how to get debugabble
>>>>>>> classes I would appreciate it. I tried getting into the cvs source with:
>>>>>>>
>>>>>>> cvs -d:pserver:yourid_at_cvs.dev.java.net:/cvs co -d jaxb-ri
>>>>>>> jaxb2-sources/jaxb-ri
>>>>>>>
>>>>>>> but using my sun id (that I used to subscribe to mailing list) it
>>>>>>> came back with unknown id. I have applied to become a code observer within
>>>>>>> the https://jaxb2-sources.dev.java.net/ project.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Jose
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe_at_jaxb.dev.java.net
>> For additional commands, e-mail: users-help_at_jaxb.dev.java.net
>>
>
>