Metacharacters and Character Sets

The power of regular expressions is based on the fact that they can contain special characters that perform special functions. These characters are not treated as regular characters and are not matched literally. They are known as metacharacters.

Some of the metacharacters that make pattern matching more generic are as follows:

[ ] . ( ) * ^ $ ? \

The [ and ] characters are used to specify a set of characters that you wish to match. Characters can be listed individually or a range of characters can be indicated by listing two characters separated by a dash (-). For example, [aeiou] matches any of the vowels. The set [a-d] is equivalent to [abcd].

If you specify the ^ character right after the opening bracket, it matches the complement of the set. In other words, any character that is not in the set. For example, [^0-9] matches any non-digit character.

The following table lists some examples of regular expressions using sets:

Regular Expression Strings that Match
'/[cr]at/' cat, rat
'/[0-9]/' Any digit
'/[0123456789]/' Any digit
'/[A-Z]/' Any uppercase letter
'/[0-9][0-9]/' Any two digits together (like 01, 42, 27…)
'/[aeiou]/' Any of the vowels

Fortunately, there are some shortcuts for some of the common character sets. For example, \d denotes "any digit" the same way that [0-9] does. \D means "any non-digit" like [^0-9]. The following table lists some other common shortcuts.

Shortcut Sequence Equivalent to
\d [0-9]
\D [^0-9]
\w [a-zA-Z0-9_] Any alphanumeric character including the underscore.
\W [^a-zA-Z0-9] Any non-alphanumeric character.
\s Any whitespace character.
\S Any non-whitespace character.

These shortcuts can be included inside a character class (set.) For example, [\da-fA-F] is a character class that will match one hexadecimal digit. The tab, new line, and return characters are specified with \t, \n and \r, respectively.

Another special metacharacter is the dot (.). A dot within a regular expression matches any character (except the \\n character, unless the s modifier is used).

Special Cases

What happens if you want to search for some of the metacharacters, like [ or .? You can escape these characters with a backslash (\) just before the character. For example, \., \, and \[ match a literal dot, backslash, and opening bracket, respectively.

Go back to the Method example and try these metacharacters with different regular expressions. Experiment with character classes and the shortcuts available. The only way to learn to use regular expressions is to build some.