users@jsr311.java.net

RE: JAX-RS: UriBuilder encoding

From: Manger, James H <James.H.Manger_at_team.telstra.com>
Date: Thu, 17 Jul 2008 12:42:25 +1000

Use cases for an encoding mode like encoding=true, but where percent chars are NOT escaped (nicknamed “true-%”).

Consider http://samplemerchant.info/uri/a%23b/résumé.html

This email, current browsers (Safari, Internet Explorer, Firefox 3…), and the HTML source for that web page all display this web address in the same way – including the non-URI character é and the %23 escape sequence (escaping a ‘#’ so it can appear in the path).

A cut-n-paste of this address (or just its path) from any of these sources should be accepted by JAX-RS. In particular, it should be accepted by UriBuilder path(…) and as @Path values.

This is an example of a string that is NOT “either completely encoded or not encoded at all”. This situation will be increasingly common.

With the current spec:

  UriBuilder.fromPath(“/uri/a%23b/résumé.html”, false) -> IllegalArgumentException

  UriBuilder.fromPath(“/uri/a%23b/résumé.html”, true).build() -> “/uri/a%2523b/r%C3%A9sum%C3%A9.html” -> 404 NOT FOUND

In my suggested true-% mode

  UriBuilder.fromPath(“/uri/a%23b/résumé.html”).build() -> “/uri/a%23b/r%C3%A9sum%C3%A9.html” -> 200 OK -> @PathParam-> “/uri/a#b/résumé.html”



Other use cases:

Use case 2: Any use case for false mode is also a use case for true-% mode as every string that is valid in false mode (ie does not trigger an IllegalArgumentException) builds exactly the same URI in true-% mode.

Use case 3: Almost any use case for true+% mode is also a use case for true-% mode as every string without a percent char builds exactly the same URI in true+% and true-% modes.

James Manger

_____________________________________________
From: Marc.Hadley_at_Sun.COM<mailto:Marc.Hadley_at_Sun.COM> [mailto:Marc.Hadley_at_Sun.COM]<mailto:%5bmailto:Marc.Hadley_at_Sun.COM%5d>
Sent: Thursday, 17 July 2008 2:27 AM
To: users_at_jsr311.dev.java.net<mailto:users_at_jsr311.dev.java.net>

Yes, with encode=true, the intent was that '%' would be encoded to %25.

I kind of imagined that uncontrolled input would be inserted into URI as the values of URI template variables rather than directly as URI components. If this is true then the presence of {} is unlikely to cause an issue. The same applies to % since all three chars would be encoded if encode=true. If you wanted to allow uncontrolled input to include pct-escaped chars as well as other chars that aren't legal then you would have to do some manual processing but I don't see that as a common use case - it seems more common that strings are either completely encoded or not encoded at all. Could you suggest some use- cases where the change you suggest would improve the developer experience.

Thanks,