Regular Expressions

Match to a sequence of characters

Example 1

In a list of muscarinic acetylcholine receptors from Swiss-Prot, let's extract all human proteins (i.e. lines containing the word HUMAN.)
#!/usr/bin/perl

my $sp_list_file = "sources/sp_list";

open (SP, $sp_list_file) || die "cannot open sp_list: $!";

while ($line = <SP>) {
   if ($line =~ /HUMAN/) {    # match line against "HUMAN"
       print $line;           # print line if matches
    }
}
Output:
ACM1_HUMAN (P11229)   MUSCARINIC ACETYLCHOLINE M1 [CHRM1] - HUMAN
ACM2_HUMAN (P08172)   MUSCARINIC ACETYLCHOLINE M2 [CHRM2] - HUMAN
ACM3_HUMAN (P20309)   MUSCARINIC ACETYLCHOLINE M3 [CHRM3] - HUMAN
ACM4_HUMAN (P08173)   MUSCARINIC ACETYLCHOLINE M4 [CHRM4] - HUMAN
ACM5_HUMAN (P08912)   MUSCARINIC ACETYLCHOLINE M5 [CHRM5] - HUMAN


Example 2

The EcoRI restriction enzyme cuts at the consensus sequence GAATTC.
To find out whether a sequence contains a restriction site for EcoR1, write;


if ($sequence =~ /GAATTC/) {
    ...
};


Table of Contents.
Previous | Next.