users@jsr311.java.net

Re: JAX-RS: UriBuilder encoding

From: Marc Hadley <Marc.Hadley_at_Sun.COM>
Date: Mon, 14 Jul 2008 14:54:01 -0400

On Jul 14, 2008, at 12:42 AM, Manger, James H wrote:

> I think UriBuilder would be improved if its encoding behaviour was
> changed.
>
> For most of the UriBuilder methods, when encoding is:
>
> ON:
> %-escape all characters other than the 66 unreserved ones.
> Even ‘{‘ and ‘}’ characters would be escaped when encoding was on.
> They would not be interpreted as placeholders.
>
> OFF:
> Split the input into {…} placeholders and literals.
> %-escape all characters in the literals that are not valid in a URI
> (ie don’t %-escape the 66 unreserved characters, the 18 reserved
> characters, or percent characters).
>
> Encoding OFF would be a sensible default for UriBuilder.
>
>
> With encoding OFF any valid URI can be built. It is friendlier than
> the current rule, as you don’t have to manually %-escape non-URI
> chars such as non-ASCII chars, spaces etc. The only characters that
> the caller needs to escape are percent signs, curly braces, and
> reserved characters when you want to bypass their special meaning.
>
> This encoding OFF rule is actually quite close the current encoding
> ON rule. The only substantial difference is how percent signs are
> handled. UriBuilder could even be simplified by scrapping the
> encode(boolean) method: mandate the above encoding OFF behaviour for
> the scheme/host/path/query/fragment methods; and the above encoding
> ON behaviour for processing placeholder values in the build(…)
> methods. The same could be done for the encode attribute of @Path,
> @PathParam, @QueryParam and @MatricParam annotations (but this does
> not affect the @Encoded annotation).
>
>
> With encoding ON it would be safe to pass any string to UriBuilder –
> even if it is user input, a file name, or from any other source
> beyond the direct control of the calling code. I suspect such
> “uncontrolled” strings will be very common.
> Even with the current encoding rules I suspect it will be common
> (because it is easy) to pass “uncontrolled” strings to
> UriBuilder.scheme/host.path/query/fragment – but it is unsafe! It
> will work most of the time, then some input (perhaps maliciously)
> will have unanticipated ‘{‘ and ‘}’ characters causing weird results.
>
I don't see that this is clearly better than what we have now. You
trade applications having to scan "uncontrolled" strings for % and {}
characters but I don't know that one is more problematic than the other.

>
> Question: Why are “{…}” placeholders not allowed in
> UriBuilder.fromUri(String uri)?
> This means UriBuilder will not be able to process templates from
> external sources (HTTP headers, HTML attributes, …), which will
> almost certainly be templates for full URIs, not just for a
> component of a URI?
>
IIRC, it was an issue of not being able to prevent a path parameter
spanning multiple URI components. Anyone else remember why we added
this restriction ?

Marc.

---
Marc Hadley <marc.hadley at sun.com>
CTO Office, Sun Microsystems.