dev@jsr311.java.net

Re: URI Escaping and Unescaping

From: Marc Hadley <Marc.Hadley_at_Sun.COM>
Date: Mon, 25 Jun 2007 13:48:06 -0400

Good discussion, I think there are valid arguments both ways wrt
automatic encoding and decoding. On balance I think that providing
some form of automatic encoding and decoding is desirable and will be
least surprising to folks already familiar with the Servlet API or
CGI. However I can also see cases where developers would rather the
API get out of the way in that area so here's what I propose we do:

@UriParam, @QueryParam, @MatrixParam - add a boolean 'decode'
property with a default value of true. When decode=true an annotated
parameter will receive the decoded value of the associated URI
component, when false it will receive the "raw" encoded value.
@UriTemplate - add a boolean 'encode' property with a default value
of true. When encode=true literals within the value of the template
will be encoded as necessary, when false the template won't be
subject to encoding and must therefore be a valid URI after
substitution of any template parameters.
UriInfo - add variants of getURIPath, getURIPathSegments,
getQueryParameters and getURIParameters that allow access to
undecoded URI information, e.g. String getURIPath(boolean decode).
Clarify that the current methods are equivalent to calling the new
methods with decode=true.

The addition of these annotation properties and interface methods
will serve to highlight encoding/decoding in the API and provide the
developer with a choice. It will mean more work for implementers but
the primary goal of the API is ease of development, not ease of
implementation ;-).

Marc.


On Jun 22, 2007, at 12:29 PM, Paul Sandoz wrote:

> Mark Hansen wrote:
>> Hi Paul,
>> Thanks for your empathy :-). Actually, I like JAX-WS a lot better
>> than JAX-RPC, but there are still some pain points ...
>
> Yes, JAX-WS is way better.
>
>
>> Thanks for the great example. Yes, actually, the code and
>> behavior you descrbie are what I am proposing. And, I would agree
>> with anyone who looks at that and says "yuck - Can't we make it
>> prettier than that!". So there is the trade-off that this EG has
>> to hash out. Do we want attractive code or do we want code that
>> actually reflects what is going on on the Internet?
>> Personally, I like my code to correspond to the Internet. The
>> JSR-311 annotations let me map bits an pieces of a URL to
>> parameters and methods. That is great. A nice convenience. But,
>> if you throw in some automatic encode/decode - then I start to get
>> confused. To read the code and think about how it maps the URL to
>> the code, I have to now parse and replace the %20, %26, etc. in my
>> head. For me, that is too much. But, others may disagree. I
>> also think that it may be too much for many implementers, and the
>> guys writing JUnits.
>> So, I would vote for uglier (and maybe even a little bit more
>> intimidating code for a beginner), as opposed to cleaner code that
>> is another step away from what is really happening on the
>> Internet. Certainly I can see how others might disagree.
>>
>
> We could do this (which is similar to what Dhanji was suggesting):
>
> @UriTemplate("x%20/{up}")
> class MyResource {
>
> @HttpContext UriInfo uriInfo;
>
> @HttpMethod
> public String get(
> @UriParam("up") @Decoded String up,
> @QUeryParam("q%20p") @Decoded String qp)
> throws URISyntaxException {
> assert up.equals("y ");
> assert qp.equals("z ");
> assert uriInfo.getDecodedURIPath().equals("x /y ");
>
> return new URI(null, null,
> uriInfo.getDecodedURIPath(),
> "q p=" + qp);
> }
> }
>
> Either which way we go i still think we need better URI
> construction support, for example:
>
> URITemplate ut = new URITemplate("x%20/{up}");
> ut.put("up", "y%20");
> ut.putDecoded("up", "y 20");
> ut.putQueryParam("qp", "z%20");
> URI u = ut.getURI();
>
> ut.parseURI(u);
> String up = ut.get("up");
> String decodedUp = ut.getDecoded("up");
> String decodedQp = ut.getDecodedQueryParam("q%20p");
>
>
>
>> One other thing ... I'm not an expert on this, so maybe others
>> can chime in, but from the limited code I've written in Perl, PHP,
>> JavaScript (i.e., popular languages with the REST crowd), I think
>> that they are used to dealing with the encoding/decoding and not
>> having the language handle it for them.
>
> HttpServletRequest.getPathInfo() is the same as CGI variable
> PATH_INFO:
>
> The extra path information, as given by the client. In other words,
> scripts can be accessed by their virtual pathname, followed by extra
> information at the end of this path. The extra information is
> sent as
> PATH_INFO. This information should be decoded by the server if it
> comes from a URL before it is passed to the CGI script.
>
> HttpServletRequest.getQueryString() is the same as CGI variable
> QUERY_STRING:
>
> The information which follows the ? in the URL which referenced this
> script. This is the query information. It should not be decoded
> in any
> fashion. This variable should always be set when there is query
> information, regardless of command line decoding.
>
> Paul.
>
>> -- Mark
>> Paul Sandoz wrote:
>>> Hi Mark,
>>>
>>> I feel your JAX-WS pain :-)
>>>
>>> I would like to highlight the following example to get a clearer
>>> understanding of what yourself and Julian are proposing.
>>>
>>> Given this URI template:
>>>
>>> /x%20/{up}
>>>
>>> We would have:
>>>
>>> @UriTemplate("x%20/{up}")
>>> class MyResource {
>>>
>>> }
>>>
>>> Given this URI:
>>>
>>> http://localhost/x%20/y%20?q%20p=z%20
>>>
>>> We could have:
>>>
>>> @UriTemplate("x%20/{up}")
>>> class MyResource {
>>>
>>> @HttpContext UriInfo uriInfo;
>>>
>>> public String get(
>>> @UriParam("up") String up,
>>> @QUeryParam("q%20p") String qp) {
>>> assert up.equals("y%20");
>>> assert qp.equals("z%20");
>>> assert uriInfo.getURIPath().equals("x%20/y%20");
>>>
>>> return URI.create(uriInfo.getURIPath() + "?q%20p=" + qp).
>>> toString();
>>> }
>>> }
>>>
>>>
>>> And when we do GET to the URI http://localhost/x%20/y%20?qp=z%20
>>> no assertions will fail and the following entity will be returned:
>>>
>>> x%20/y%20?q%20p=z%20
>>>
>>> Is that what you are proposing?
>>>
>>> 1) that URI templates and query parameter names must always be
>>> declared in encoded form;
>>>
>>> 2) the URI path, template values and query parameter values, when
>>> accessed as strings, must be returned in encoded form; and
>>>
>>> 3) the developer must explicitly decode those of 2).
>>>
>>> Paul.
>>>
>>> Mark Hansen wrote:
>>>
>>>> Actually, I *strongly* disagree with the emerging consensus
>>>> (Dhanji, Phil, David) on this.
>>>> I don't think that JSR-311 should make any effort to abstract
>>>> away the URL encoding from the developer. Automating the
>>>> encoding/decoding will just be (1) a source of confusion for
>>>> developers and (2) a source of bugs for implementers. Regarding
>>>> (1) - URL encoding is a fact of life for RESTful
>>>> development. Lets not try to hide the realities of the Internet
>>>> from
>>>> programmers. Heading down that path just starts to get
>>>> confusing. Then, to use our API, programmers have to remember
>>>> what algorithm we
>>>> use to map Strings to URL encoded Strings. Or worse, developers
>>>> who
>>>> don't know about URL encoding will wonder why their code does one
>>>> thing, but when you copy and paste that query string into a
>>>> Browser, it doesn't work.
>>>>
>>>> Our goal should be to make Java work the way that the Internet
>>>> works. My biggest frustration with JAX-WS (and this is also a
>>>> problem
>>>> with java.net.URLConnection) is that it tries too hard to
>>>> abstract the
>>>> programmer away from the Internet. Our goal on JSR-311 should
>>>> be to
>>>> provide a natural Java representation of what is really going on
>>>> on the
>>>> Internet. Hiding the messy stuff in the name of "easier
>>>> programming"
>>>> is a bad idea.
>>>>
>>>>
>>>> Regarding (2) - If we start coming up with rules for how to make
>>>> URL
>>>> encoding "easier", I guarantee you that it will be source of
>>>> bugs in
>>>> the implemenation. Here is an example from JAX-WS:
>>>>
>>>>
>>>> Today, I have spend several hours trying to find some bug in how
>>>> the
>>>> JAX-WS 2.1 RI implemenation handles URL encoding/decoding.
>>>>
>>>> http://api.shopping.yahoo.com/ShoppingService/V2/productSearch?
>>>> appid=soajavabook&query=razr&category=Electronics%20%26%20Camera
>>>>
>>>>
>>>> Its a query of the Yahoo Shopping API. Works fine in a
>>>> browser. But,
>>>> when I use the encoded query string with a Dispatch<T>, by
>>>> setting the MessageContext.QUERY_STRING, it doesn't work (at
>>>> least not
>>>> in build 42 - I'm trying a newer build before filing a bug
>>>> report).. Like this:
>>>> productSearchDispatch.getRequestContext().put
>>>> (MessageContext.QUERY_STRING,
>>>> "appid=soajavabook&query=razr&category=Electronics%20%26%
>>>> 20Camera");
>>>>
>>>> It used to work in one of the older JAX-WS builds. Now it
>>>> doesn't. Bugs like this appear, get fixed, disappear, and come
>>>> back again.
>>>>
>>>>
>>>>
>>>> -- Mark
>>>>
>>>>
>>>>
>>>>
>>>
>
> --
> | ? + ? = To question
> ----------------\
> Paul Sandoz
> x38109
> +33-4-76188109
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe_at_jsr311.dev.java.net
> For additional commands, e-mail: dev-help_at_jsr311.dev.java.net
>

---
Marc Hadley <marc.hadley at sun.com>
CTO Office, Sun Microsystems.