users@jersey.java.net

[Jersey] Re: default regex for path variables

From: Waclaw Kusnierczyk <waclaw.kusnierczyk_at_gmail.com>
Date: Thu, 29 Jan 2015 00:19:48 +0100

Just a minor correction:

"Clearly, [^/]+? applied globally (as in Perl with the modifier g), can
with a dose of imprecision be said to effectively match the whole string --
but yet fact it does not match the string, it matches each of the string's
characters one by one. So it matches as many times as the string is long,
each time just one character."

is of course meant to say 'the whole string up to and exclusive of the
first slash' and 'as many times as there are characters before the first
slash'.

Wacek

On Thu, Jan 29, 2015 at 12:17 AM, Waclaw Kusnierczyk <
waclaw.kusnierczyk_at_gmail.com> wrote:

> Marek,
>
> Thanks for the explanation. Clearly, [^/]+? applied globally (as in Perl
> with the modifier g), can with a dose of imprecision be said to effectively
> match the whole string -- but yet fact it does not match the string, it
> matches each of the string's characters one by one. So it matches as many
> times as the string is long, each time just one character.
>
> Consider the example given in the doc you refer to:
>
> >>
>
> Enter your regex: .*?foo // reluctant quantifier
> Enter input string to search: xfooxxxxxxfoo
> I found the text "xfoo" starting at index 0 and ending at index 4.
> I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.
>
> The second example [the one quoted above], however, is reluctant, so it
> starts by first consuming "nothing". Because "foo" doesn't appear at the
> beginning of the string, it's forced to swallow the first letter (an "x"),
> which triggers the first match at 0 and 4. Our test harness continues the
> process until the input string is exhausted. It finds another match at 4
> and 13.
>
> <<
>
> Very clearly, the pattern .*?foo matches two separate substrings. It
> never reports matching xfooxxxxxxfoo. Neither does the description claim
> that. It _exhausts_ the string, true -- in a loop, it finds multiple
> subsequent non-overlapping matches that concatenate to the whole string.
>
> With [^/]+? it so happens that it will match the same characters in a path
> fragment as [^/]+, however, the latter matches just one string (the whole
> fragment), the former matches all of the fragment's characters individually
> but not the whole fragment (except for degenerate cases).
>
> Note, the situation is different if the regex is terminated with a slash.
> Then [^/]+?/ will gradually extend the string consumed until it finds a
> slash, while [^/]+/ will consume the whole string (as per the doc you cite)
> and then backtrack. They effectively will match the same string, but
> arrive at it in different ways.
>
> The original regex in the doc I referred to does not have a trailing
> slash. I still believe this is not an appropriate explanation. I can see
> that the sources do use [^/]+?, but this pattern must then be used in a
> loop to match the characters individually. It still will not match the
> whole string in the usual sense. Once you use global matching (in a loop),
> you can just use [^/] with the same effect---it will successively consume
> all characters one by one until the first slash.
>
> Let me know if this seems wrong to you.
>
> Best,
> Wacek
>