Re: JAX-RS: UriBuilder encoding

From: Marc Hadley <Marc.Hadley_at_Sun.COM>
Date: Wed, 23 Jul 2008 15:21:18 -0400

On Jul 23, 2008, at 3:06 PM, Stephan Koops wrote:
>
> I think the current approach with the encode attribute is very easy
> to use. Your proposal will produce boilerplate code, because you
> have to manually call the encode methods. This will not improve the
> readability of resource method code, especially if you get data from
> somewhere (databse e.g.) and want to use it for building URIs.
>
I don't think it will introduce additional boiler plate code since all
methods will always do encoding. The main change is how we deal with
pct-encoded chars in values passed to builder methods. Before in
encode=true mode we treated % as a char that needed to be encoded, now
we will be smarter and only encode it if its not part of a pct-encoded
value.

If you want to work with pct-encoded data the builder methods will
ignore the escaped octets anyway and if you work with decoded data the
methods will automatically encode stuff for you.

Marc.

>
> Marc Hadley schrieb:
>> OK, I'm convinced. Here's what I propose we do:
>>
>> (a) Remove the encode and isEncode methods, all methods that add
>> URI components will perform contextual encoding of characters that
>> are not allowed in the relevant URI component with the following
>> exceptions: { and }. % chars followed by two hex digits (the rfc
>> pct-encoded production) will not be encoded, other % chars will.
>>
>> (b) Add a static method that will encode any characters not part of
>> the rfc 3986 unreserved production.
>>
>> (c) Similar to (a), the build method will encode characters that
>> are not allowed in the relevant URI components. I.e. any embedded
>> { or } will be encoded unlike when adding URI components in (a).
>>
>> The above will allow creation of any valid URI. The only case that
>> developers will have to be careful with is when an input string
>> contains a literal % character coincidentally followed by two hex
>> digits. The method added by (b) can be used to fix this although it
>> won't work if the same string also contains pct-encoded chars - I
>> don't think this a big issue since any string obtained from @*Param
>> is either encoded or not, you won't get a mixture.
>>
>> Marc.
>>
>> On Jul 16, 2008, at 10:42 PM, Manger, James H wrote:
>>
>>> Use cases for an encoding mode like encoding=true, but where
>>> percent chars are NOT escaped (nicknamed “true-%”).
>>>
>>> Consider http://samplemerchant.info/uri/a%23b/résumé.html
>>>
>>> This email, current browsers (Safari, Internet Explorer, Firefox
>>> 3…), and the HTML source for that web page all display this web
>>> address in the same way – including the non-URI character é and
>>> the %23 escape sequence (escaping a ‘#’ so it can appear in the
>>> path).
>>>
>>> A cut-n-paste of this address (or just its path) from any of these
>>> sources should be accepted by JAX-RS. In particular, it should be
>>> accepted by UriBuilder path(…) and as @Path values.
>>>
>>> This is an example of a string that is NOT “either completely
>>> encoded or not encoded at all”. This situation will be
>>> increasingly common.
>>>
>>> With the current spec:
>>>
>>> UriBuilder.fromPath(“/uri/a%23b/résumé.html”, false) ->
>>> IllegalArgumentException
>>>
>>> UriBuilder.fromPath(“/uri/a%23b/résumé.html”, true).build() -> “/
>>> uri/a%2523b/r%C3%A9sum%C3%A9.html” -> 404 NOT FOUND
>>>
>>> In my suggested true-% mode
>>>
>>> UriBuilder.fromPath(“/uri/a%23b/résumé.html”).build() -> “/uri/a
>>> %23b/r%C3%A9sum%C3%A9.html” -> 200 OK -> @PathParam-> “/uri/a#b/
>>> résumé.html”
>>>
>>>
>>>
>>> Other use cases:
>>>
>>> Use case 2: Any use case for false mode is also a use case for
>>> true-% mode as every string that is valid in false mode (ie does
>>> not trigger an IllegalArgumentException) builds exactly the same
>>> URI in true-% mode.
>>>
>>> Use case 3: Almost any use case for true+% mode is also a use case
>>> for true-% mode as every string without a percent char builds
>>> exactly the same URI in true+% and true-% modes.
>>>
>>> James Manger
>>>
>>> _____________________________________________
>>> From: Marc.Hadley_at_Sun.COM [mailto:Marc.Hadley_at_Sun.COM]
>>> Sent: Thursday, 17 July 2008 2:27 AM
>>> To: users_at_jsr311.dev.java.net
>>>
>>> Yes, with encode=true, the intent was that '%' would be encoded to
>>> %25.
>>>
>>> I kind of imagined that uncontrolled input would be inserted into
>>> URI as the values of URI template variables rather than directly
>>> as URI components. If this is true then the presence of {} is
>>> unlikely to cause an issue. The same applies to % since all three
>>> chars would be encoded if encode=true. If you wanted to allow
>>> uncontrolled input to include pct-escaped chars as well as other
>>> chars that aren't legal then you would have to do some manual
>>> processing but I don't see that as a common use case - it seems
>>> more common that strings are either completely encoded or not
>>> encoded at all. Could you suggest some use- cases where the change
>>> you suggest would improve the developer experience.
>>>
>>> Thanks,
>>
>> ---
>> Marc Hadley <marc.hadley at sun.com>
>> CTO Office, Sun Microsystems.
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe_at_jsr311.dev.java.net
>> For additional commands, e-mail: users-help_at_jsr311.dev.java.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_jsr311.dev.java.net
> For additional commands, e-mail: users-help_at_jsr311.dev.java.net
>

---
Marc Hadley <marc.hadley at sun.com>
CTO Office, Sun Microsystems.