Re: JAX-RS: UriBuilder encoding

From: Marc Hadley <Marc.Hadley_at_Sun.COM>
Date: Tue, 22 Jul 2008 11:57:45 -0400

OK, I'm convinced. Here's what I propose we do:

(a) Remove the encode and isEncode methods, all methods that add URI
components will perform contextual encoding of characters that are not
allowed in the relevant URI component with the following exceptions:
{ and }. % chars followed by two hex digits (the rfc pct-encoded
production) will not be encoded, other % chars will.

(b) Add a static method that will encode any characters not part of
the rfc 3986 unreserved production.

(c) Similar to (a), the build method will encode characters that are
not allowed in the relevant URI components. I.e. any embedded { or }
will be encoded unlike when adding URI components in (a).

The above will allow creation of any valid URI. The only case that
developers will have to be careful with is when an input string
contains a literal % character coincidentally followed by two hex
digits. The method added by (b) can be used to fix this although it
won't work if the same string also contains pct-encoded chars - I
don't think this a big issue since any string obtained from @*Param is
either encoded or not, you won't get a mixture.

Marc.

On Jul 16, 2008, at 10:42 PM, Manger, James H wrote:

> Use cases for an encoding mode like encoding=true, but where percent
> chars are NOT escaped (nicknamed “true-%”).
>
> Consider http://samplemerchant.info/uri/a%23b/résumé.html
>
> This email, current browsers (Safari, Internet Explorer, Firefox
> 3…), and the HTML source for that web page all display this web
> address in the same way – including the non-URI character é and the
> %23 escape sequence (escaping a ‘#’ so it can appear in the path).
>
> A cut-n-paste of this address (or just its path) from any of these
> sources should be accepted by JAX-RS. In particular, it should be
> accepted by UriBuilder path(…) and as @Path values.
>
> This is an example of a string that is NOT “either completely
> encoded or not encoded at all”. This situation will be increasingly
> common.
>
> With the current spec:
>
> UriBuilder.fromPath(“/uri/a%23b/résumé.html”, false) ->
> IllegalArgumentException
>
> UriBuilder.fromPath(“/uri/a%23b/résumé.html”, true).build() -> “/
> uri/a%2523b/r%C3%A9sum%C3%A9.html” -> 404 NOT FOUND
>
> In my suggested true-% mode
>
> UriBuilder.fromPath(“/uri/a%23b/résumé.html”).build() -> “/uri/a
> %23b/r%C3%A9sum%C3%A9.html” -> 200 OK -> @PathParam-> “/uri/a#b/
> résumé.html”
>
>
>
> Other use cases:
>
> Use case 2: Any use case for false mode is also a use case for true-
> % mode as every string that is valid in false mode (ie does not
> trigger an IllegalArgumentException) builds exactly the same URI in
> true-% mode.
>
> Use case 3: Almost any use case for true+% mode is also a use case
> for true-% mode as every string without a percent char builds
> exactly the same URI in true+% and true-% modes.
>
> James Manger
>
> _____________________________________________
> From: Marc.Hadley_at_Sun.COM [mailto:Marc.Hadley_at_Sun.COM]
> Sent: Thursday, 17 July 2008 2:27 AM
> To: users_at_jsr311.dev.java.net
>
> Yes, with encode=true, the intent was that '%' would be encoded to
> %25.
>
> I kind of imagined that uncontrolled input would be inserted into
> URI as the values of URI template variables rather than directly as
> URI components. If this is true then the presence of {} is unlikely
> to cause an issue. The same applies to % since all three chars would
> be encoded if encode=true. If you wanted to allow uncontrolled input
> to include pct-escaped chars as well as other chars that aren't
> legal then you would have to do some manual processing but I don't
> see that as a common use case - it seems more common that strings
> are either completely encoded or not encoded at all. Could you
> suggest some use- cases where the change you suggest would improve
> the developer experience.
>
> Thanks,

---
Marc Hadley <marc.hadley at sun.com>
CTO Office, Sun Microsystems.