The data files you need for this assignment are located in a directory named
(from your home directory, this can also be accessed as /home/guest/ass5).
These files may also be obtained from here.
If you copy them to your own directory, make sure that their lines are not wrapped.
We recommend creating a special directory for result files,
so that they will not mix with your programs.
Source files information
- The chr21_genes.txt file lists all genes from human chromosome
21, in their order along the chromosome, as described in Hattori et al. (2000) Nature 405, 311-319.
For each gene, the file gives the gene symbol, description and category.
The fields are separated by the TAB (\t) character.
You can find the meaning of each category in the original paper, under the "Gene categories" section.
- The HUGO_genes.txt file lists all human genes having
official symbol approved by the HUGO gene nomenclature committee.
For each gene, the file gives its symbol and description, separated by a
|You must solve these exercises by using hashes|
- Based on data from the chr21_genes.txt file,
write a program that asks the user to enter a gene symbol and
then prints the description for that gene. The program should
give an error message if the entered symbol is not found in the table
(user should enter the symbol in the right case, upper case).
HINT: First read the entire text file into a hash
that maps the association between gene symbol and description.
- Write a program that counts how many genes are in each category (1.1, 1.2, 2.1 etc.).
You should assume no prior knowledge about which categories exist in the file.
The program should print the results such that categories are arranged in ascending order.
Note: you will notice that one gene has no category information. That's due
to missing data in the file.
Table of Contents.
Course Home Page.