users@jsr311.java.net

Re: JAX-RS: UriBuilder encoding

From: Marc Hadley <Marc.Hadley_at_Sun.COM>
Date: Tue, 15 Jul 2008 10:28:42 -0400

On Jul 15, 2008, at 12:24 AM, Manger, James H wrote:
>
> > I don't see that this is clearly better than what we have now.
> > You trade applications having to scan "uncontrolled" strings for %
> and {} characters
> > but I don't know that one is more problematic than the other.
>
> UriBuilder’s two current encoding modes are not different enough to
> both be useful.
> 1. Neither %-escapes curly braces so neither can be used with
> “uncontrolled” input.
> 2. Chars that are never valid in a URI (eg ^ § ç) are %-escaped or
> cause an IllegalArgumentException in the two modes. The exception
> adds almost no value (perhaps catching a programmer’s typo at a
> different point, though still at runtime) so both modes may as well
> always %-escape these chars.
> 3. Neither %-escapes unreserved characters (for good reason).
> 4. Neither %-escapes most reserved characters.
>
With encode=false nothing is ever escaped so lets concentrate on
encode=true (the default). In that mode, UriBuilder takes care of
escaping everything that isn't legal in a particular URI component.
So, e.g. the path component disallows "?", "#", "[" and "]" from the
reserved characters so:

URI u = UriBuilder.fromPath("foo/?[]#@").build(); => foo/%3F%5B%5D%23@

whereas the fragment component disallows "#", "[" and "]" from the
reserved characters so:

UriBuilder.fromPath("foo").fragment("/?[]#@").build(); => foo#/?%5B%5D
%23@

I think this kind of context-sensitive encoding is useful and frees
developers from having to worry about the minutiae of URI syntax.

>
> So the only real difference is whether percent signs are %-escaped.
> This difference is too small to warrant separate modes.
>
No, the difference is whether illegal characters are encoded or not:
encode=false encodes nothing, encode-true performs context-sensitive
encoding of any illegal character.

> We can a) simplify the API by eliminating one mode, or b) makes the
> modes substantially different so each is useful in its own
> circumstance.
>
We had a long email thread discussing whether to work in encoded or
unencoded space and the status quo is the compromise that worked for
everyone. See the thread starting at:

https://jsr311.dev.java.net/servlets/ReadMsg?list=dev&msgNo=477

> I now think the best solution is a):
> * Have a single encoding mode, eliminating a handful of UriBuilder
> methods;
> * Add a new UriBuilder static method that returns its input after %-
> escaping all non-unreserved chars. C.f. Pattern.quote(String) and
> Matcher.quoteReplacement(String).
> public static String quote(String component)
>
The static pct-encoding method seems like a useful addition.

> The single encoding mode needs to allow all valid URIs and templates
> to be created so it must not escape curly braces, reserved chars or
> percent chars.
Then users would need to manually escape reserved characters that are
illegal in certain URI components, that would be a step backwards in
my opinion.

> It may as well escape chars that would otherwise cause an
> IllegalArgumentException.
>
Then you are almost back to the status quo with the default
encoded=true...

Marc.

---
Marc Hadley <marc.hadley at sun.com>
CTO Office, Sun Microsystems.