File input/output


1. Read file containing a sequence in FASTA format, and store the entire sequence as one string.

The FASTA format contains the sequence name and description on the first line, and the sequence itself in subsequent lines.

In this program, the user is asked to enter the file name containing the sequence.

For example, use the CEACAM3.fasta file as input.


#declare all variables


#open file for reading

open (SEQ, $seq_file) || die "cannot open \"$seq_file\": $!";

#read sequence name and description from first line

$seq_name = <SEQ>;
chomp ($seq_name);

#initialize a variable to contain the entire sequence

$sequence = "";

#read sequence lines from file

while ($line = <SEQ>) {
  chomp ($line);
  $sequence .= $line;     #add line to $sequence

#close file

close (SEQ);

#print sequence length on the screen, for validation

$seq_length = length ($sequence);
print "Sequence length: $seq_length\n";

2. Copy file1 to file2, so that lines in file2 will be numbered.

For example, use the file the_ostrich.txt as input.

#store file names in variables (good habit)

$file1 = "the_ostrich.txt";
$file2 = "the_ostrich_numbered.txt";

#open the files

open (SOURCE, $file1) || die "cannot open \"$file1\": $!";
open (RESULT, ">$file2") || die "cannot open \"$file2\": $!";

#initialize a line counter

$count = 0;     

#read lines from file1. No need to chop them, since you
#will need the line breaks when printing to file2

while ($line = <SOURCE>) {
   print RESULT "[$count] $line";

#close files, though this is actually not necessary
#in this case

close (SOURCE);
close (RESULT);

Table of Contents.
Back | Next.