users@jsr311.java.net

RE: JAX-RS: UriBuilder encoding

From: Manger, James H <James.H.Manger_at_team.telstra.com>
Date: Tue, 15 Jul 2008 14:24:25 +1000

Marc,

> I don't see that this is clearly better than what we have now.
> You trade applications having to scan "uncontrolled" strings for % and {} characters
> but I don't know that one is more problematic than the other.

UriBuilder’s two current encoding modes are not different enough to both be useful.
1. Neither %-escapes curly braces so neither can be used with “uncontrolled” input.
2. Chars that are never valid in a URI (eg ^ § ç) are %-escaped or cause an IllegalArgumentException in the two modes. The exception adds almost no value (perhaps catching a programmer’s typo at a different point, though still at runtime) so both modes may as well always %-escape these chars.
3. Neither %-escapes unreserved characters (for good reason).
4. Neither %-escapes most reserved characters.

So the only real difference is whether percent signs are %-escaped.
This difference is too small to warrant separate modes.

We can a) simplify the API by eliminating one mode, or b) makes the modes substantially different so each is useful in its own circumstance.

I now think the best solution is a):
* Have a single encoding mode, eliminating a handful of UriBuilder methods;
* Add a new UriBuilder static method that returns its input after %-escaping all non-unreserved chars. C.f. Pattern.quote(String) and Matcher.quoteReplacement(String).
    public static String quote(String component)

The single encoding mode needs to allow all valid URIs and templates to be created so it must not escape curly braces, reserved chars or percent chars. It may as well escape chars that would otherwise cause an IllegalArgumentException.


I am pretty sure the single encoding mode applies equally well to @Path, @QueryParam, @PathParam, @MatrixParam (and @FormParam) value attributes. Consequently the encode attribute for those annotations can be eliminated – which is a significant simplification. As a bonus it avoids any possible confusion between @Encoded and @*{encode=true/false}.

James Manger