users@jsr311.java.net

RE: JAX-RS: UriBuilder encoding

From: Manger, James H <James.H.Manger_at_team.telstra.com>
Date: Wed, 16 Jul 2008 13:33:04 +1000

Marc,

> encode=true ...
> I think this kind of context-sensitive encoding is useful and frees
> developers from having to worry about the minutiae of URI syntax.

I absolutely agree.
 
Why not give developers as much help as possible when encode=false,
as long as it doesn't prevent ANY valid URI being built?


First, I better clarify a major assumption.
I have been assuming that a percent character will be escaped when encode=true.
  UriBuilder.fromPath("abc%2Fdef", true) -> "abc%252Fdef"
Is this correct? It seems to be what Jersey does, but the spec is not totally clear. The spec (javadoc) talks about "automatic encoding of illegal characters". Percent characters are not illegal in URIs -- as long as they are followed by a pair of hex digits.


As an alternative way of explaining my point, consider what would happen if encode=true mode did NOT escape '%' characters.
* 99.99% of encode=true usage is unchanged as raw '%' chars are rarely used in URIs.
* For the remaining usage, the developer has to write "%25" instead of "%".
* For handling "uncontrolled" input, the chars that can cause problems goes from 2 {} to 3 {}% -- which cannot make it materially harder. If fact, handling "uncontrolled" input becomes much easier. The caller can escape these 3 chars then keep using encode=true mode (to take care of the context-sensitive & i18n encoding). With the current modes the caller has to switch to encode=false mode. Consequently, they have to do all the context-sensitive and i18n encoding themselves.
* There is no valid URI that cannot be built in this encode=true mode so encode=false can be eliminated.
* A developer can %-escape more than they need to (if they like strings in their code look more like the final URI will appear) as the extra %'s will not be re-escaped by UriBuilder.



> We had a long email thread ...
> https://jsr311.dev.java.net/servlets/ReadMsg?list=dev&msgNo=477

The @Encoded annotation is a great solution for the parsing side.

I don't really want to rehash past design choices on the building side, but I feel this one is important as it can substantially simplify the API.


>> The single encoding mode needs to allow all valid URIs and templates
>> to be created so it must not escape curly braces, reserved chars or
>> percent chars.
> Then users would need to manually escape reserved characters that are
> illegal in certain URI components, that would be a step backwards in
> my opinion.

"never escape any reserved char" is easier to document and implement, than
"escape reserved chars that are illegal in the context", but the latter is more user-friendly. I am very happy to keep the context-aware escaping, as long as the javadoc is explicit about how '%' is handled.

James Manger