Matching Repetitions

In the previous examples, you were only matching expressions consisting of a few generic characters and literal words. To help write more expressive patterns, you use a quantifier metacharacter. The quantifiers are as follows:

? * + { }

The quantifiers allow you to determine the number of repeats of a portion of a regular expression you consider a match. Quantifiers are located immediately after the character, character class or grouping that you want to match. The following table defines each quantifier and its meaning.

Quantifier Definition
a? Match ‘a’ one or zero times.
a* Match ‘a’ zero or more times (any number of times.)
a+ Match ‘a’ one or more times (at least one time.)
a{n,m} Match ‘a’ at least n times, but not more than m times
a{n, } Match ‘a’ at least n or more times.
a{n} Match ‘a’ exactly n times.
Note: 'a' in the previous table can be any character, character class or grouping. You will learn more about groupings later but basically, you can group a part of a regular expression using parenthesis.

The following are some examples using matching repetitions:

Matching repetition Description
‘/\w+/’ Any alphanumeric word (one or more alphanumeric characters together.)
‘/-?\d+/’ A number (one or more digits) optionally prefixed by a hyphen.
‘/[a-z]+\t\d{1,5}/’ Any lowercase word, followed by a tab, followed by 1 to 5 digits
‘/\w+/’ Any alphanumeric word (one or more alphanumeric characters together.)
‘/The.*dog/’ Any line that is followed by anything and then dog. Examples: The nice dog The quick brown fox jumped over the lazy dog The WhateverHeredog Thedog
Now, you have enough tools to create some useful regular expressions. For example, let's build a simple regular expression to match 10 digit telephone numbers. You may start with:
	'/d{10}/' // 10 digits (no more, no less)
	
This regular expression matches any 10 digit number but it has some weaknesses. For instance, what happens if you want to accept numbers that contain dashes (-) in between, such as 321-123-1234? In that case, you can do the following:
	'/\d{3}-\d{3}-\d{4}/'
	
This is fine, but what if you want the dashes to be optional? Try this:
	'/\d{3}-?\d{3}-?\d{4}/'
	
This is better, but still there are some improvements to be made. You might not want to allow a zero (0) as the first digit of the number. Thus, the first digit must be in the class [1-9] instead of \d as follows:
	'/[1-9]\d{2}-?\d{3}-?\d{4}/' 
	

Did you understand it? Let's study it in parts:


  1. First, a digit between 1 and 9: [1-9]
  2. Next, two digits (from 0 to 9): \d{2}
  3. Then, an optional dash: -?
  4. Three digits: \d{3}
  5. Another optional dash: -?
  6. Finally, four digits: \d{4}
Try it in the debugger:
	phone as String
	input "Enter your phone number:" phone 
	
	if phone.isMatch('/[1-9]\d{2}-?\d{3}-?\d{4}/') then 
	    display "OK, a valid phone number" 
	else 
	    display "ERROR, invalid phone number" 
	end