dev@jsr311.java.net

Re: URI Escaping and Unescaping

From: Mark Hansen <mark_at_javector.com>
Date: Tue, 26 Jun 2007 00:11:48 -0400

+1. This sounds like a good compromise without too much clutter being
added to the API. -- Mark

Marc Hadley wrote:

> Good discussion, I think there are valid arguments both ways wrt
> automatic encoding and decoding. On balance I think that providing
> some form of automatic encoding and decoding is desirable and will be
> least surprising to folks already familiar with the Servlet API or
> CGI. However I can also see cases where developers would rather the
> API get out of the way in that area so here's what I propose we do:
>
> @UriParam, @QueryParam, @MatrixParam - add a boolean 'decode'
> property with a default value of true. When decode=true an annotated
> parameter will receive the decoded value of the associated URI
> component, when false it will receive the "raw" encoded value.
> @UriTemplate - add a boolean 'encode' property with a default value
> of true. When encode=true literals within the value of the template
> will be encoded as necessary, when false the template won't be
> subject to encoding and must therefore be a valid URI after
> substitution of any template parameters.
> UriInfo - add variants of getURIPath, getURIPathSegments,
> getQueryParameters and getURIParameters that allow access to
> undecoded URI information, e.g. String getURIPath(boolean decode).
> Clarify that the current methods are equivalent to calling the new
> methods with decode=true.
>
> The addition of these annotation properties and interface methods
> will serve to highlight encoding/decoding in the API and provide the
> developer with a choice. It will mean more work for implementers but
> the primary goal of the API is ease of development, not ease of
> implementation ;-).
>
> Marc.
>
>
> On Jun 22, 2007, at 12:29 PM, Paul Sandoz wrote:
>
>> Mark Hansen wrote:
>>
>>> Hi Paul,
>>> Thanks for your empathy :-). Actually, I like JAX-WS a lot better
>>> than JAX-RPC, but there are still some pain points ...
>>
>>
>> Yes, JAX-WS is way better.
>>
>>
>>> Thanks for the great example. Yes, actually, the code and behavior
>>> you descrbie are what I am proposing. And, I would agree with
>>> anyone who looks at that and says "yuck - Can't we make it prettier
>>> than that!". So there is the trade-off that this EG has to hash
>>> out. Do we want attractive code or do we want code that actually
>>> reflects what is going on on the Internet?
>>> Personally, I like my code to correspond to the Internet. The
>>> JSR-311 annotations let me map bits an pieces of a URL to
>>> parameters and methods. That is great. A nice convenience. But,
>>> if you throw in some automatic encode/decode - then I start to get
>>> confused. To read the code and think about how it maps the URL to
>>> the code, I have to now parse and replace the %20, %26, etc. in my
>>> head. For me, that is too much. But, others may disagree. I also
>>> think that it may be too much for many implementers, and the guys
>>> writing JUnits.
>>> So, I would vote for uglier (and maybe even a little bit more
>>> intimidating code for a beginner), as opposed to cleaner code that
>>> is another step away from what is really happening on the
>>> Internet. Certainly I can see how others might disagree.
>>>
>>
>> We could do this (which is similar to what Dhanji was suggesting):
>>
>> @UriTemplate("x%20/{up}")
>> class MyResource {
>>
>> @HttpContext UriInfo uriInfo;
>>
>> @HttpMethod
>> public String get(
>> @UriParam("up") @Decoded String up,
>> @QUeryParam("q%20p") @Decoded String qp)
>> throws URISyntaxException {
>> assert up.equals("y ");
>> assert qp.equals("z ");
>> assert uriInfo.getDecodedURIPath().equals("x /y ");
>>
>> return new URI(null, null,
>> uriInfo.getDecodedURIPath(),
>> "q p=" + qp);
>> }
>> }
>>
>> Either which way we go i still think we need better URI construction
>> support, for example:
>>
>> URITemplate ut = new URITemplate("x%20/{up}");
>> ut.put("up", "y%20");
>> ut.putDecoded("up", "y 20");
>> ut.putQueryParam("qp", "z%20");
>> URI u = ut.getURI();
>>
>> ut.parseURI(u);
>> String up = ut.get("up");
>> String decodedUp = ut.getDecoded("up");
>> String decodedQp = ut.getDecodedQueryParam("q%20p");
>>
>>
>>
>>> One other thing ... I'm not an expert on this, so maybe others can
>>> chime in, but from the limited code I've written in Perl, PHP,
>>> JavaScript (i.e., popular languages with the REST crowd), I think
>>> that they are used to dealing with the encoding/decoding and not
>>> having the language handle it for them.
>>
>>
>> HttpServletRequest.getPathInfo() is the same as CGI variable PATH_INFO:
>>
>> The extra path information, as given by the client. In other words,
>> scripts can be accessed by their virtual pathname, followed by extra
>> information at the end of this path. The extra information is sent as
>> PATH_INFO. This information should be decoded by the server if it
>> comes from a URL before it is passed to the CGI script.
>>
>> HttpServletRequest.getQueryString() is the same as CGI variable
>> QUERY_STRING:
>>
>> The information which follows the ? in the URL which referenced this
>> script. This is the query information. It should not be decoded in
>> any
>> fashion. This variable should always be set when there is query
>> information, regardless of command line decoding.
>>
>> Paul.
>>
>>> -- Mark
>>> Paul Sandoz wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> I feel your JAX-WS pain :-)
>>>>
>>>> I would like to highlight the following example to get a clearer
>>>> understanding of what yourself and Julian are proposing.
>>>>
>>>> Given this URI template:
>>>>
>>>> /x%20/{up}
>>>>
>>>> We would have:
>>>>
>>>> @UriTemplate("x%20/{up}")
>>>> class MyResource {
>>>>
>>>> }
>>>>
>>>> Given this URI:
>>>>
>>>> http://localhost/x%20/y%20?q%20p=z%20
>>>>
>>>> We could have:
>>>>
>>>> @UriTemplate("x%20/{up}")
>>>> class MyResource {
>>>>
>>>> @HttpContext UriInfo uriInfo;
>>>>
>>>> public String get(
>>>> @UriParam("up") String up,
>>>> @QUeryParam("q%20p") String qp) {
>>>> assert up.equals("y%20");
>>>> assert qp.equals("z%20");
>>>> assert uriInfo.getURIPath().equals("x%20/y%20");
>>>>
>>>> return URI.create(uriInfo.getURIPath() + "?q%20p=" + qp).
>>>> toString();
>>>> }
>>>> }
>>>>
>>>>
>>>> And when we do GET to the URI http://localhost/x%20/y%20?qp=z%20
>>>> no assertions will fail and the following entity will be returned:
>>>>
>>>> x%20/y%20?q%20p=z%20
>>>>
>>>> Is that what you are proposing?
>>>>
>>>> 1) that URI templates and query parameter names must always be
>>>> declared in encoded form;
>>>>
>>>> 2) the URI path, template values and query parameter values, when
>>>> accessed as strings, must be returned in encoded form; and
>>>>
>>>> 3) the developer must explicitly decode those of 2).
>>>>
>>>> Paul.
>>>>
>>>> Mark Hansen wrote:
>>>>
>>>>> Actually, I *strongly* disagree with the emerging consensus
>>>>> (Dhanji, Phil, David) on this.
>>>>> I don't think that JSR-311 should make any effort to abstract
>>>>> away the URL encoding from the developer. Automating the
>>>>> encoding/decoding will just be (1) a source of confusion for
>>>>> developers and (2) a source of bugs for implementers. Regarding
>>>>> (1) - URL encoding is a fact of life for RESTful
>>>>> development. Lets not try to hide the realities of the Internet
>>>>> from
>>>>> programmers. Heading down that path just starts to get
>>>>> confusing. Then, to use our API, programmers have to remember
>>>>> what algorithm we
>>>>> use to map Strings to URL encoded Strings. Or worse, developers who
>>>>> don't know about URL encoding will wonder why their code does one
>>>>> thing, but when you copy and paste that query string into a
>>>>> Browser, it doesn't work.
>>>>>
>>>>> Our goal should be to make Java work the way that the Internet
>>>>> works. My biggest frustration with JAX-WS (and this is also a
>>>>> problem
>>>>> with java.net.URLConnection) is that it tries too hard to
>>>>> abstract the
>>>>> programmer away from the Internet. Our goal on JSR-311 should be to
>>>>> provide a natural Java representation of what is really going on
>>>>> on the
>>>>> Internet. Hiding the messy stuff in the name of "easier
>>>>> programming"
>>>>> is a bad idea.
>>>>>
>>>>>
>>>>> Regarding (2) - If we start coming up with rules for how to make URL
>>>>> encoding "easier", I guarantee you that it will be source of bugs in
>>>>> the implemenation. Here is an example from JAX-WS:
>>>>>
>>>>>
>>>>> Today, I have spend several hours trying to find some bug in how the
>>>>> JAX-WS 2.1 RI implemenation handles URL encoding/decoding.
>>>>>
>>>>> http://api.shopping.yahoo.com/ShoppingService/V2/productSearch?
>>>>> appid=soajavabook&query=razr&category=Electronics%20%26%20Camera
>>>>>
>>>>>
>>>>> Its a query of the Yahoo Shopping API. Works fine in a browser.
>>>>> But,
>>>>> when I use the encoded query string with a Dispatch<T>, by
>>>>> setting the MessageContext.QUERY_STRING, it doesn't work (at
>>>>> least not
>>>>> in build 42 - I'm trying a newer build before filing a bug
>>>>> report).. Like this:
>>>>> productSearchDispatch.getRequestContext().put
>>>>> (MessageContext.QUERY_STRING,
>>>>> "appid=soajavabook&query=razr&category=Electronics%20%26% 20Camera");
>>>>>
>>>>> It used to work in one of the older JAX-WS builds. Now it
>>>>> doesn't. Bugs like this appear, get fixed, disappear, and come
>>>>> back again.
>>>>>
>>>>>
>>>>>
>>>>> -- Mark
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>> --
>> | ? + ? = To question
>> ----------------\
>> Paul Sandoz
>> x38109
>> +33-4-76188109
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe_at_jsr311.dev.java.net
>> For additional commands, e-mail: dev-help_at_jsr311.dev.java.net
>>
>
> ---
> Marc Hadley <marc.hadley at sun.com>
> CTO Office, Sun Microsystems.
>
>


-- 
Mark Hansen
Author of "SOA Using Java Web Services"
http://soabook.com
mark_at_javector.com
cell:   914-924-3398
office: 914-595-5407
skype:  khookguy