users@jersey.java.net

[Jersey] Re: default regex for path variables

From: Waclaw Kusnierczyk <waclaw.kusnierczyk_at_gmail.com>
Date: Thu, 29 Jan 2015 00:25:54 +0100

Please consider these examples:

https://regex101.com/r/fE9sF5/1
https://regex101.com/r/fE9sF5/2
https://regex101.com/r/fE9sF5/3
https://regex101.com/r/fE9sF5/4

I don't believe Java is any different in this respect.

Wacek


On Thu, Jan 29, 2015 at 12:19 AM, Waclaw Kusnierczyk <
waclaw.kusnierczyk_at_gmail.com> wrote:

> Just a minor correction:
>
> "Clearly, [^/]+? applied globally (as in Perl with the modifier g), can
> with a dose of imprecision be said to effectively match the whole string --
> but yet fact it does not match the string, it matches each of the string's
> characters one by one. So it matches as many times as the string is long,
> each time just one character."
>
> is of course meant to say 'the whole string up to and exclusive of the
> first slash' and 'as many times as there are characters before the first
> slash'.
>
> Wacek
>
> On Thu, Jan 29, 2015 at 12:17 AM, Waclaw Kusnierczyk <
> waclaw.kusnierczyk_at_gmail.com> wrote:
>
>> Marek,
>>
>> Thanks for the explanation. Clearly, [^/]+? applied globally (as in Perl
>> with the modifier g), can with a dose of imprecision be said to effectively
>> match the whole string -- but yet fact it does not match the string, it
>> matches each of the string's characters one by one. So it matches as many
>> times as the string is long, each time just one character.
>>
>> Consider the example given in the doc you refer to:
>>
>> >>
>>
>> Enter your regex: .*?foo // reluctant quantifier
>> Enter input string to search: xfooxxxxxxfoo
>> I found the text "xfoo" starting at index 0 and ending at index 4.
>> I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.
>>
>> The second example [the one quoted above], however, is reluctant, so it
>> starts by first consuming "nothing". Because "foo" doesn't appear at the
>> beginning of the string, it's forced to swallow the first letter (an "x"),
>> which triggers the first match at 0 and 4. Our test harness continues the
>> process until the input string is exhausted. It finds another match at 4
>> and 13.
>>
>> <<
>>
>> Very clearly, the pattern .*?foo matches two separate substrings. It
>> never reports matching xfooxxxxxxfoo. Neither does the description claim
>> that. It _exhausts_ the string, true -- in a loop, it finds multiple
>> subsequent non-overlapping matches that concatenate to the whole string.
>>
>> With [^/]+? it so happens that it will match the same characters in a
>> path fragment as [^/]+, however, the latter matches just one string (the
>> whole fragment), the former matches all of the fragment's characters
>> individually but not the whole fragment (except for degenerate cases).
>>
>> Note, the situation is different if the regex is terminated with a
>> slash. Then [^/]+?/ will gradually extend the string consumed until it
>> finds a slash, while [^/]+/ will consume the whole string (as per the doc
>> you cite) and then backtrack. They effectively will match the same string,
>> but arrive at it in different ways.
>>
>> The original regex in the doc I referred to does not have a trailing
>> slash. I still believe this is not an appropriate explanation. I can see
>> that the sources do use [^/]+?, but this pattern must then be used in a
>> loop to match the characters individually. It still will not match the
>> whole string in the usual sense. Once you use global matching (in a loop),
>> you can just use [^/] with the same effect---it will successively consume
>> all characters one by one until the first slash.
>>
>> Let me know if this seems wrong to you.
>>
>> Best,
>> Wacek
>>
>
>