dev@jersey.java.net

Re: reducing double slash to a single slash (//->/) in http servlet request path info ? bug or feature ?

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Mon, 30 Jul 2007 14:41:17 +0200

Jakub Podlesak wrote:
>> Jakub Podlesak wrote:
>>> The same bug is in [java.net.URI#normalize] as well
>>> (maybe the root cause since [getPathInfo] just returns a previously set
>>> variable value).
>>>
>> Feature or bug, i think it might be a bug...
>>
>> My understanding was that URI.normalize() would remove contiguous '/'
>> from the path, at least that is what i was using it for. So perhaps i
>> was using a bug as a feature :-)
>
> You wanted contiguous '/' to be removed from the path? I do not understand why.
> To correct possible typos in URIs?
>

Sort of, it is for two reasons:

1) to redirect to a canonical 'cool' URI; and

2) so that the application does not return 404s.

For example go to these URIs:

http://news.bbc.co.uk/2/hi/middle_east/6921617.stm
http://news.bbc.co.uk/2/hi/middle_east/////6921617.stm

the latter returns a 404, but it would be nice to redirect to the former.

Another example:

http://www.theregister.co.uk/2007/07/30/floods_fires_space/
http://www.theregister.co.uk/2007/07/30/////floods_fires_space/

they both return the same thing.

The two previous examples show inconsistencies in coolness where as the
following is IMHO better:

http://www.w3.org/Provider/Style/////URI
http://www.w3.org/Provider/Style/URI


>> From [1] it is clear that './././' should be replaced by '/' but as
>> Marc says there is nothing about contiguous '/'.
>>
>>
>> Jersey by default normalizes the URI. If you use a HTTP sniffer you
>> should notice a redirect. See the handleRequest of
>> com.sun.ws.rest.impl.application.WebApplicationImpl:
>>
>> public void handleRequest(ContainerRequest request,
>> ContainerResponse response) {
>> final WebApplicationContext localContext = new
>> WebApplicationContext(this, request, response);
>> context.set(localContext);
>>
>> if (resourceConfig.isRedirectToNormalizedURI()) {
>> final URI uri = request.getURI();
>> final URI normalizedUri = uri.normalize();
>>
>> if (uri != normalizedUri) {
>>
>> response.setResponse(ResponseBuilderImpl.temporaryRedirect(normalizedUri).build());
>> return;
>> }
>> }
>>
>> Thus we can workaround this by switching off redirection. But, this
>> requires that we add an option to the APT code that generates the
>> ResourceConfig class so that the
>> resourceConfig.isRedirectToNormalizedURI returns false.
>
> I think normalization of URIs is ok as long as it affects only dot segments.
> (I do not think it is reasonable to use URI templates like "dir1/dir2/../res1"
> instead of "dir1/res1")
>

Right, URI.normalize() should do what the specification says.


> Attaching a patch for the "URI in URI case", but I think
> it is not the right way to go.
>

Hmm... not sure. We need to make this independent of the container and
my gut feeling is the ensureStringIsPartOfURI is a bit of a hack.


1) Change isRedirectToNormalizedURI to be isRedirectToCanonicalURI
    and clearly specify this as being URI normalization + changing '/+'
    to '/'; and

2) Specify an APT processing option to switch redirection off.

it might make sense to split 1 into two options: normalization; and
normalization with '/+' -> '/'.

Paul.

> ~Jakub
>
>> Paul.
>>
>> [1] http://gbiv.com/protocols/uri/rfc/rfc3986.html#relative-dot-segments
>>
>>> The following code:
>>> --cuthere--
>>> URI baseURI = new URI("http://host/path1//path2");
>>> System.out.println("baseURI.toString() = " + baseURI.toString());
>>> System.out.println("baseURI.normalize().toString() = " +
>>> baseURI.normalize().toString());
>>> --cuthere--
>>>
>>> Generates:
>>> --cuthere--
>>> baseURI.toString() = http://host/path1//path2
>>> baseURI.normalize().toString() = http://host/path1/path2
>>> --cuthere--
>>>
>>> ~Jakub
>>>
>>>
>>> On Thu, Jul 26, 2007 at 01:19:43PM -0400, Marc Hadley wrote:
>>>> According to RFC 3986[1], section 3.3 the : is allowed in a path
>>>> segment so there shouldn't be any reason to encode it. In addition I
>>>> don't see anything about removing double '/' characters so I think
>>>> that getPathInfo has a bug.
>>>>
>>>> Marc.
>>>>
>>>> [1] http://ietf.org/rfc/rfc3986.txt
>>>>
>>>> On Jul 25, 2007, at 9:03 AM, Jakub Podlesak wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have tried following url:
>>>>>
>>>>> http://localhost:8080/Bookmark/resources/users/japod/bookmarks/
>>>>> ftp://any.net/file.txt
>>>>>
>>>>> and noticed, that corresponding HttpServletRequest (tested on
>>>>> glassfish) provides:
>>>>>
>>>>> getRequestURI() ->"/Bookmark/resources/users/japod/bookmarks/ftp://
>>>>> any.net/file.txt"
>>>>> getPathInfo()->"/users/japod/bookmarks/ftp:/any.net/file.txt"
>>>>> ^^^^^^^^
>>>>>
>>>>> Please note the missing slash in the latter (ftp:/any.net instead
>>>>> of ftp://any.net)
>>>>>
>>>>> Is this a bug or a feature?
>>>>>
>>>>> ~Jakub
>>>>>
>>>>>
>>>>> P.S. It causes an exception in [setURIs] method of
>>>>> [com.sun.ws.rest.impl.container.servlet.HttpRequestAdaptor]
>>>> ---
>>>> Marc Hadley <marc.hadley at sun.com>
>>>> CTO Office, Sun Microsystems.
>>>>
>>>>
>> --
>> | ? + ? = To question
>> ----------------\
>> Paul Sandoz
>> x38109
>> +33-4-76188109
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe_at_jersey.dev.java.net
>> For additional commands, e-mail: dev-help_at_jersey.dev.java.net
>>
>>
>> ------------------------------------------------------------------------
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe_at_jersey.dev.java.net
>> For additional commands, e-mail: dev-help_at_jersey.dev.java.net

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109