jsr356-experts@websocket-spec.java.net

[jsr356-experts] Re: UTF-8 decoding issues

From: Joakim Erdfelt <joakim_at_intalio.com>
Date: Tue, 5 Mar 2013 08:06:19 -0700

Jetty uses its own UTF8 decoding implementation.

https://github.com/eclipse/jetty.project/blob/master/jetty-util/src/main/java/org/eclipse/jetty/util/Utf8Appendable.java

Inspired by the work by Bjoern Hoehrmann's UTF8
Decoder<http://bjoern.hoehrmann.de/utf-8/decoder/dfa/>
.

We pass the autobahn tests, both client and server.

There is 1 gotcha tho.
Extensions that modify TEXT frames (such as
x-webkit-deflate-frame<https://tools.ietf.org/id/draft-tyoshino-hybi-websocket-perframe-deflate-05.txt>and
permessage-compress<http://tools.ietf.org/html/draft-tyoshino-hybi-permessage-compression>),
these will arrive in as compressed bytes on a TEXT OpCode. Which means
you'll have to make sure you validate your UTF8 after any extensions
unravel TEXT Frames.

--
Joakim Erdfelt <joakim_at_intalio.com>
webtide.com <http://www.webtide.com/>
Developer advice, services and support
from the Jetty & CometD experts
eclipse.org/jetty - cometd.org
On Tue, Mar 5, 2013 at 7:23 AM, Mark Thomas <mark_at_homeinbox.net> wrote:
> All,
>
> Slightly off-topic but in implementing WebSocket and running Autobahn
> against my implementation I have hit a number of issues with the UTF-8
> decoder provided by the JRE. Has anyone else hit these issues?
>
> I have a test case that highlights the issues[1] that can be summarised
> as a) not detecting errors early enough and b) swallowing too many bytes
> when an error is detected (possibly as a consequence of a)
>
> I'm planning on raising a bug with Oracle but wanted to see how
> wide-spread the problem was before I did so.
>
> Mark
>
> [1]
>
> http://svn.apache.org/viewvc/tomcat/trunk/test/org/apache/tomcat/util/buf/TestUtf8.java?view=markup
>