The data files you need for this assignment are located inside a directory named
/home/guest/source/. (from your home directory, this can also be accessed as ../guest/).
File Input/Output and Basic Text Processing
These files may also be obtained directly from here.
If you copy them to your own directory, make sure that the .txt format is unchanged (not edited).
The ../guest/source subdirectory contains four text files
from the UniGene database. Each file describes one gene.
Write a program that receives one of the following gene names: 'ADH2', 'CEACAM4', 'TGM1', 'GLDC', opens the file for that
gene name (i.e. TGM1.txt), extracts the list of tissues in which this gene is expressed (i.e. esophagus;germ cell;larynx;pancreas;uterus;
colon;head_neck;uterus) and prints this to the screen (or to a file with the extension ".express")
The output file should have the following format (example is for the
2. Germ Cell
7. Head Neck
- The tissues list in the UniGene files appears after the keyword 'EXPRESS' and before the keyword "CHROMOSOME".
- Before submitting the assignment, check all resulting files against the source files.
Make sure that tissues are counted from 1, and that there are no empty values in the tissues list.
- Sometimes the same tissue may appear twice in the list, once capitalized and
another time in all-lowercase (e.g. Uterus and uterus in TGM1).
This is due to a "bug" in UniGene. You may disregard it.
- Use the substring and split functions. Don't use regular expressions.
Course Home Page.