dev@jsr311.java.net

Re: URI Escaping and Unescaping

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Fri, 22 Jun 2007 18:29:57 +0200

Mark Hansen wrote:
> Hi Paul,
>
> Thanks for your empathy :-). Actually, I like JAX-WS a lot better than
> JAX-RPC, but there are still some pain points ...
>

Yes, JAX-WS is way better.


> Thanks for the great example. Yes, actually, the code and behavior you
> descrbie are what I am proposing. And, I would agree with anyone who
> looks at that and says "yuck - Can't we make it prettier than that!".
> So there is the trade-off that this EG has to hash out. Do we want
> attractive code or do we want code that actually reflects what is going
> on on the Internet?
>
> Personally, I like my code to correspond to the Internet. The JSR-311
> annotations let me map bits an pieces of a URL to parameters and
> methods. That is great. A nice convenience. But, if you throw in some
> automatic encode/decode - then I start to get confused. To read the
> code and think about how it maps the URL to the code, I have to now
> parse and replace the %20, %26, etc. in my head. For me, that is too
> much. But, others may disagree. I also think that it may be too much
> for many implementers, and the guys writing JUnits.
>
> So, I would vote for uglier (and maybe even a little bit more
> intimidating code for a beginner), as opposed to cleaner code that is
> another step away from what is really happening on the Internet.
> Certainly I can see how others might disagree.
>

We could do this (which is similar to what Dhanji was suggesting):

  @UriTemplate("x%20/{up}")
  class MyResource {

       @HttpContext UriInfo uriInfo;

       @HttpMethod
       public String get(
               @UriParam("up") @Decoded String up,
               @QUeryParam("q%20p") @Decoded String qp)
                   throws URISyntaxException {
           assert up.equals("y ");
           assert qp.equals("z ");
           assert uriInfo.getDecodedURIPath().equals("x /y ");

           return new URI(null, null,
               uriInfo.getDecodedURIPath(),
               "q p=" + qp);
       }
  }

Either which way we go i still think we need better URI construction
support, for example:

   URITemplate ut = new URITemplate("x%20/{up}");
   ut.put("up", "y%20");
   ut.putDecoded("up", "y 20");
   ut.putQueryParam("qp", "z%20");
   URI u = ut.getURI();

   ut.parseURI(u);
   String up = ut.get("up");
   String decodedUp = ut.getDecoded("up");
   String decodedQp = ut.getDecodedQueryParam("q%20p");



> One other thing ... I'm not an expert on this, so maybe others can
> chime in, but from the limited code I've written in Perl, PHP,
> JavaScript (i.e., popular languages with the REST crowd), I think that
> they are used to dealing with the encoding/decoding and not having the
> language handle it for them.
>

HttpServletRequest.getPathInfo() is the same as CGI variable PATH_INFO:

   The extra path information, as given by the client. In other words,
   scripts can be accessed by their virtual pathname, followed by extra
   information at the end of this path. The extra information is sent as
   PATH_INFO. This information should be decoded by the server if it
   comes from a URL before it is passed to the CGI script.

HttpServletRequest.getQueryString() is the same as CGI variable
QUERY_STRING:

   The information which follows the ? in the URL which referenced this
   script. This is the query information. It should not be decoded in any
   fashion. This variable should always be set when there is query
   information, regardless of command line decoding.

Paul.

> -- Mark
>
>
> Paul Sandoz wrote:
>
>> Hi Mark,
>>
>> I feel your JAX-WS pain :-)
>>
>> I would like to highlight the following example to get a clearer
>> understanding of what yourself and Julian are proposing.
>>
>> Given this URI template:
>>
>> /x%20/{up}
>>
>> We would have:
>>
>> @UriTemplate("x%20/{up}")
>> class MyResource {
>>
>> }
>>
>> Given this URI:
>>
>> http://localhost/x%20/y%20?q%20p=z%20
>>
>> We could have:
>>
>> @UriTemplate("x%20/{up}")
>> class MyResource {
>>
>> @HttpContext UriInfo uriInfo;
>>
>> public String get(
>> @UriParam("up") String up,
>> @QUeryParam("q%20p") String qp) {
>> assert up.equals("y%20");
>> assert qp.equals("z%20");
>> assert uriInfo.getURIPath().equals("x%20/y%20");
>>
>> return URI.create(uriInfo.getURIPath() + "?q%20p=" + qp).
>> toString();
>> }
>> }
>>
>>
>> And when we do GET to the URI http://localhost/x%20/y%20?qp=z%20 no
>> assertions will fail and the following entity will be returned:
>>
>> x%20/y%20?q%20p=z%20
>>
>> Is that what you are proposing?
>>
>> 1) that URI templates and query parameter names must always be
>> declared in encoded form;
>>
>> 2) the URI path, template values and query parameter values, when
>> accessed as strings, must be returned in encoded form; and
>>
>> 3) the developer must explicitly decode those of 2).
>>
>> Paul.
>>
>> Mark Hansen wrote:
>>
>>> Actually, I *strongly* disagree with the emerging consensus (Dhanji,
>>> Phil, David) on this.
>>> I don't think that JSR-311 should make any effort to abstract
>>> away the URL encoding from the developer. Automating the
>>> encoding/decoding will just be (1) a source of confusion for
>>> developers and (2) a source of bugs for implementers. Regarding (1) -
>>> URL encoding is a fact of life for RESTful
>>> development. Lets not try to hide the realities of the Internet from
>>> programmers. Heading down that path just starts to get confusing.
>>> Then, to use our API, programmers have to remember what algorithm we
>>> use to map Strings to URL encoded Strings. Or worse, developers who
>>> don't know about URL encoding will wonder why their code does one
>>> thing, but when you copy and paste that query string into a Browser,
>>> it doesn't work.
>>>
>>> Our goal should be to make Java work the way that the Internet
>>> works. My biggest frustration with JAX-WS (and this is also a problem
>>> with java.net.URLConnection) is that it tries too hard to abstract the
>>> programmer away from the Internet. Our goal on JSR-311 should be to
>>> provide a natural Java representation of what is really going on on the
>>> Internet. Hiding the messy stuff in the name of "easier programming"
>>> is a bad idea.
>>>
>>>
>>> Regarding (2) - If we start coming up with rules for how to make URL
>>> encoding "easier", I guarantee you that it will be source of bugs in
>>> the implemenation. Here is an example from JAX-WS:
>>>
>>>
>>> Today, I have spend several hours trying to find some bug in how the
>>> JAX-WS 2.1 RI implemenation handles URL encoding/decoding.
>>>
>>> http://api.shopping.yahoo.com/ShoppingService/V2/productSearch?appid=soajavabook&query=razr&category=Electronics%20%26%20Camera
>>>
>>>
>>>
>>> Its a query of the Yahoo Shopping API. Works fine in a browser. But,
>>> when I use the encoded query string with a Dispatch<T>, by
>>> setting the MessageContext.QUERY_STRING, it doesn't work (at least not
>>> in build 42 - I'm trying a newer build before filing a bug report)..
>>> Like this:
>>>
>>> productSearchDispatch.getRequestContext().put(MessageContext.QUERY_STRING,
>>>
>>> "appid=soajavabook&query=razr&category=Electronics%20%26%20Camera");
>>>
>>> It used to work in one of the older JAX-WS builds. Now it doesn't.
>>> Bugs like this appear, get fixed, disappear, and come back again.
>>>
>>>
>>>
>>> -- Mark
>>>
>>>
>>>
>>>
>>
>
>

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109