Regular Expressions

Parentheses as memory

Parentheses in a regular expression will "remember" text matched by the subexpression they enclose, and enable later use of that text in the program.

How does it work?

We have seen that a regular expression or motif usually matches a group of strings in which which several variations are allowed.

Sometimes we wish to know the exact sequence of characters in the string that matched the regular expression.

If we enclose the regular expression with parentheses, the motif in parenthesis will be automatically assigned into a variable named $1.

If we enclose several motifs of the regular expression with parentheses, then their matched substrings will be sequentially assigned into $1, $2, $3 etc.

Example 1

Recall the example in which we asked the user to enter date and time in a given format.

Let us now extract from the date the actual day, month, year, hour, and minutes.

#!/usr/bin/perl

print "Please enter date and time, as in \"08-OCT-2012  16:30\"\n";
my $entry = <STDIN>;
chop ($entry);

$entry =~ /(\d\d)-(\w\w\w)-(\d\d\d\d)  (\d\d):(\d\d)/;

# $1 now contains the day
# $2     contains the month;
# $3     contains the year;
# $4     contains the hour;
# $5     contains the minutes;

# for example, to print the month we would write:

print "Month: $2\n";

Example 2

Given an HTML text containing a link tag, extract the URL.

The link:
Assignment 6

The HTML source:
<A HREF="http://tarshish.md.biu.ac.il/assignment6.html"> Assignment 6 </A>

The Perl program:

#!/usr/bin/perl

$html = "<A HREF=\"http://tarshish.md.biu.ac.il/assignment6.html\"> Assignment 6 </A>.";

$html =~ /<A HREF="(.*)">/;

print "URL: $1\n";
Result:
URL: http://tarshish.md.biu.ac.il/assignment6.html


Table of Contents.
Previous | Next.