In regular expressions, extraction refers to the storage of strings matched by one part of the regular expression with the purpose of using them elsewhere in the expression. This is very useful for parsing and for general text processing.
An extraction group is delimited by parenthesis. For each grouping, the part of the string that matches inside the parenthesis goes into a particular position within an array of matched groupings. In PBL, the extraction can be done with the match function, which returns the array of substrings for each grouping.
time as String
matches as String[]
input "Enter a time (hh:mm:ss):" time
matches = time.match('/(\d\d):(\d\d):(\d\d)/')
if matches is not null then
display "Hours: " + matches[1] + "\n" +
"Minutes: " + matches[2] + "\n" +
"Seconds: " + matches[3]
else
display "Invalid time!"
end
For the previous example, if you enter "12:40:23", the array will contain the following:
| Position | Value |
|---|---|
| 1 | 12:40:23 |
| 2 | 12 |
| 3 | 40 |
| 4 | 23 |
Positions are assigned to each group from left to right.
The following is a real world example of extraction. Suppose that you need to interpret a text file with lines with the following format:
property = value
The file can also have comment lines, which begin with the pound sign (#). A sample of the file follows:
# Configuration parameters adminEmail=admin@yoursite.com serverHost=server.yoursite.com serverPort=12345 # some preferences soundEnabled=false fontSize=12 # colors background = white foreground = blue
port = properties["serverPort"]
First, you need to define the regular expression to interpret a valid line in the file. As mentioned before, lines can be in property = value format or they may start with a pound (#) sign. In the latter case, the line must be ignored.
The assignment lines can be matched with /\w+=\w+/. This looks for a word (\w+) and equals sign (=) and another word (\w+).
/\w+\s?=\s?\w+/
/(\w+)\s?=\s?(\w+)/
/^(\w+)\s?=\s?(\w+)$/
input "Enter a line:" line
m = line.match('/^(\w+)\s?=\s?(\w+)$/')
if m is not null then
display "Property: " + m[1] + "\nValue: "
+ m[2]
else
display "ERROR, invalid line!"
end
/^#.*/
input "Enter a line:" line
m = line.match('/(^#.*$)|^(\w+)\s?=\s?(\w+)$/')
if m is not null then
if m[1] = "" then
display "Property: " + m[3] + "\nValue: "
+ m[4]
else
display "Comment line found: " + m[0]
end
else
display "ERROR, invalid line!"
end
for each line in TextFile("/tmp/test.txt").lines
m = line.match('/(^#.*$)|(^(\w+)\s?=\s?(\w+)$)/')
if m is not null then
// if m is not a comment
if m[1] = "" then
props[m[3]] = m[4]
end
else
// erroneous line - ignore it
end
end
display props
The following examples show regular expression solutions to common problems.
/(.*)\/([^\/]*)$/
Position [1] will contain the path (usr/utilities/reader ), and position [2] will contain the name of the file (readme.txt).
/([\w\.]+)@([\w\.]+)/
Position [1] will contain the user ID (support), and position [2] will contain the host name (bea.com).
/(\w+):\/\/([^:\/]+)(:(\d+))?(\/.*)?/
| Position | Value |
|---|---|
| 1 | http |
| 2 | www.bea.com |
| 3 | :80 |
| 4 | 80 |
| 5 | /index.html |