jsr356-experts@websocket-spec.java.net

[jsr356-experts] Re: [jsr356-users] Re: UTF-8 decoding issues

From: Mark Thomas <mark_at_homeinbox.net>
Date: Tue, 05 Mar 2013 15:34:02 +0000

On 05/03/2013 15:06, Joakim Erdfelt wrote:
> Jetty uses its own UTF8 decoding implementation.

Yep. That is the route I took for Tomcat. I started with the Harmony
implementation and then tightened it up based on test cases from
AutoBahn, CVE-2008-2938 and the Unicode spec.

> https://github.com/eclipse/jetty.project/blob/master/jetty-util/src/main/java/org/eclipse/jetty/util/Utf8Appendable.java
>
> Inspired by the work by Bjoern Hoehrmann's UTF8
> Decoder<http://bjoern.hoehrmann.de/utf-8/decoder/dfa/>
> .
>
> We pass the autobahn tests, both client and server.

We pass the server tests (at least we did the last time I checked - I
need to run them again to check recent refactoring hasn't broken
anything) - haven't tried the client tests yet and I know I have some
issues to look at when running over SSL.

> There is 1 gotcha tho.
> Extensions that modify TEXT frames (such as
> x-webkit-deflate-frame<https://tools.ietf.org/id/draft-tyoshino-hybi-websocket-perframe-deflate-05.txt>and
> permessage-compress<http://tools.ietf.org/html/draft-tyoshino-hybi-permessage-compression>),
> these will arrive in as compressed bytes on a TEXT OpCode. Which means
> you'll have to make sure you validate your UTF8 after any extensions
> unravel TEXT Frames.

Good to know. Thanks.

Mark

>
> --
> Joakim Erdfelt <joakim_at_intalio.com>
> webtide.com <http://www.webtide.com/>
> Developer advice, services and support
> from the Jetty & CometD experts
> eclipse.org/jetty - cometd.org
>
>
> On Tue, Mar 5, 2013 at 7:23 AM, Mark Thomas <mark_at_homeinbox.net> wrote:
>
>> All,
>>
>> Slightly off-topic but in implementing WebSocket and running Autobahn
>> against my implementation I have hit a number of issues with the UTF-8
>> decoder provided by the JRE. Has anyone else hit these issues?
>>
>> I have a test case that highlights the issues[1] that can be summarised
>> as a) not detecting errors early enough and b) swallowing too many bytes
>> when an error is detected (possibly as a consequence of a)
>>
>> I'm planning on raising a bug with Oracle but wanted to see how
>> wide-spread the problem was before I did so.
>>
>> Mark
>>
>> [1]
>>
>> http://svn.apache.org/viewvc/tomcat/trunk/test/org/apache/tomcat/util/buf/TestUtf8.java?view=markup
>>
>