dev@fi.java.net

Re: JAXB and Namespaces

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Wed, 15 Jun 2005 18:28:59 +0200

Kohsuke Kawaguchi wrote:
> Paul Sandoz wrote:
>
>> There are defintely some bugs (well missing features really) with
>> respect to the Encoder class because it does not support full UTF-8
>> encoding of all possible code points (especially high and low
>> surrogates). It may be best to use NIO here or we can copy the code
>> from FI which i optimized specifically for UTF-8, i can fix this if
>> you want.
>
>
> Ah, that's right. Surrogates.
>
> I used to think that I'm pretty familiar with all those nitty gritty
> details about encoding, charset, and all that stuff. And now look at me...
>

:-) same for me until i had to make the FI implementation work with
Japanese documents and the like!


> I thought the encoding code is one of the hotspots, so I assumed
> inlining them manually would be worthwhile (as opposed to use NIO
> encoder.) JIBX also had the similar code inlined, so that was also a
> motivation.
>
> If there's something you can copy very quickly, that would be great.
> Otherwise I can fix it by myself.
>

See here for code:

https://fi.dev.java.net/source/browse/fi/FastInfoset/src/com/sun/xml/fastinfoset/Encoder.java?view=markup

and the encodeUTF8String method.

This code can almost be copied, however i make use of some methods in
the the Xerces XMLChar class (that is copied) in FI, but it should be
easy to copy these specific methods as they do not rely on any character
tables.

This code has also been tested and performance tested. So perhaps it is
best to copy this?

Paul.

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109