Re: URI Escaping and Unescaping

From: Mark Hansen <>
Date: Thu, 21 Jun 2007 13:56:50 -0400

Actually, I *strongly* disagree with the emerging consensus (Dhanji, Phil, David) on this.

I don't think that JSR-311 should make any effort to abstract
away the URL encoding from the developer. Automating the
encoding/decoding will just be (1) a source of confusion for
developers and (2) a source of bugs for implementers.

Regarding (1) - URL encoding is a fact of life for RESTful
development. Lets not try to hide the realities of the Internet from
programmers. Heading down that path just starts to get confusing.
Then, to use our API, programmers have to remember what algorithm we
use to map Strings to URL encoded Strings. Or worse, developers who
don't know about URL encoding will wonder why their code does one
thing, but when you copy and paste that query string into a Browser, it doesn't work.

Our goal should be to make Java work the way that the Internet
works. My biggest frustration with JAX-WS (and this is also a problem
with is that it tries too hard to abstract the
programmer away from the Internet. Our goal on JSR-311 should be to
provide a natural Java representation of what is really going on on the
Internet. Hiding the messy stuff in the name of "easier programming"
is a bad idea.

Regarding (2) - If we start coming up with rules for how to make URL
encoding "easier", I guarantee you that it will be source of bugs in
the implemenation. Here is an example from JAX-WS:

Today, I have spend several hours trying to find some bug in how the
JAX-WS 2.1 RI implemenation handles URL encoding/decoding.

Its a query of the Yahoo Shopping API. Works fine in a browser. But,
when I use the encoded query string with a Dispatch<T>, by
setting the MessageContext.QUERY_STRING, it doesn't work (at least not
in build 42 - I'm trying a newer build before filing a bug report)..
Like this:

It used to work in one of the older JAX-WS builds. Now it doesn't.
Bugs like this appear, get fixed, disappear, and come back again.

-- Mark

Mark Hansen
Author of "SOA Using Java Web Services"
cell:   914-924-3398
office: 914-595-5407
skype:  khookguy
Dhanji R. Prasanna wrote:
> Im with Paul on this. The API should abstract away the common URL 
> encoding/decoding step.
> It's not that hard to provide an undecoded option:
> @RawUriParam("name") String undecodedName
> Dhanji.
> On 6/22/07, *Paul Sandoz* < 
> <>> wrote:
>     Julian Reschke wrote:
>     > Hi,
>     >
>     > I think this kind of proves that the URI template internet draft
>     needs
>     > to be finished, and then, the relation between JSR-311's
>     templates and
>     > the ones described by the IETF spec needs to be clarified.
>     >
>     Agreed.
>     > Right now,
>     >
>     <
>     <>>
>     > seems to speak about URIs, nothing else. So you can't have an
>     unescaped
>     > blank space, nor non-ASCII characters. It seems to me that this
>     is what
>     > the current RI should implement.
>     >
>     That seems reasonable.
>     > Which also means that unescaping of templated values needs to be
>     done in
>     > a separate step. That may be a bit ugly, but I really prefer that in
>     > comparison to messing around with the template format.
>     >
>     Just because the template requires escaping (for 'conformance'
>     reasons)
>     does not mean the template values accessed by the developer need be. A
>     similar case can be made for annotations that require a conformant
>     URI,
>     the path can be accessed as a decoded values (URI.getPath).
>     Note that it is possible to support types other than String [1] with
>     @UriParam, thereby requiring decoding:
>        Binds a method parameter, class field or property to a URI
>     template
>        parameter value. The class of the annotated parameter, field or
>        property must have a constructor that accepts a single String
>        argument, or a static method named valueOf that accepts a single
>        String argument (see, for example, Integer.valueOf(String)).
>     If we don't decode then a developer will spend time debugging a
>     problem
>     to find that funny characters are present in the template values.
>     On the
>     hand if we do decode a developer will spend time debugging URI
>     creation
>     exceptions. Both are a source of nasty sleeper bugs :-(
>     However, I would think it highly likely that if a developer uses
>     UriParam or QueryParam that they want to do something useful with it
>     (e.g. as a DB key or SQL query) thus decoding will have to be done
>     and
>     to tell the developer that they have to do this seems contrary to a
>     developer-friendly API. In either case the decoded value might be used
>     as part of URI creation...
>     IMHO I think we need to investigate techniques for URI creation and
>     manipulation given the knowledge that the developer is likely to
>     prefer
>     working with decoded values (while not pulling the rug from under the
>     developers that need to work with escaped values). For example, we
>     could
>     expose a URI template class that supports the parsing and creation of
>     URIs, it may be possible to get a URI template from the current
>     resource
>     class (on itself or the ones for the next matches e.g. to use for URI
>     creation) etc.
>     Paul.
>     [1]
>     >
>     >
>     > Best regards, Julian
>     >
>     > (*) we could use IRI templates, instead of URI templates, but
>     then we'll
>     > still have to take care of characters not allowed in IRIs, such
>     as SP,
>     > "{", "}", "/" and so on...
>     >
>     --
>     | ? + ? = To question
>     ----------------\
>         Paul Sandoz
>              x38109
>     +33-4-76188109
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail:
>     <>
>     For additional commands, e-mail:
>     <>