References and Data Structures

Non-homogeneous data structures

Perl data structures need not necessarily be homogeneous. Here is an example for defining an anonymous data structure that contains some gene information. $gene is a reference to the entire data structure.

$gene =
      'ID' => 'Hs.22',
      'TITLE' => 'transglutaminase 1 (K polypeptide epidermal ...',
      'GENE'  => 'TGM1',
      'CYTOBAND' => '14q11.2',
      'LOCUSLINK' => 7051,
      'EXPRESS' => ['Esophagus',
                    'Germ Cell',

      'CHROMOSOME' => 14,
      'STS' => [ { 'ACC' => 'G07152',
                   'NAME' => 'D14S1225',
                   'UNISTS' => 31136 }    ],
      'PROTSIM' => [ { 'ORG' => 'Homo sapiens',
                       'PROTGI' => 1070465,
                       'PROTID' => 'PIR:TGHUM1',
                       'PCT' => 100,
                       'ALN' => 816},
                      { 'ORG' => 'Mus musculus',
                        'PROTGI' => 730933,
                        'PROTID' => 'SP:Q08189',
                        'PCT' => 39,
                        'ALN' => 662},
                      { 'ORG' => 'Rattus norvegicus',
                        'PROTGI' => 135697,
                        'PROTID' => 'SP:P23606',
                        'PCT' => 91,
                        'ALN' => 815} ],
      'SCOUNT' => 24,
       SEQUENCE => [ {...}, 
                     ...     ]


You are welcome to practice working with this data structure. Try to retrieve data from the various gene files. Store the information for each gene in turn in a data structure, and pass it to a subtoutine. There, try to access individual data pieces. Try to print the data in a different format than the source file (e.g. in HTML). etc.

Original Data for TGM1

ID          Hs.22
TITLE       transglutaminase 1 (K polypeptide epidermal type I, protein-glutamine-gamma-glutamyltransferase)
GENE        TGM1
CYTOBAND    14q11.2
EXPRESS     ;Esophagus;Germ Cell;Larynx;Pancreas;Uterus;colon;head_neck;uterus
STS         ACC=G07152 NAME=D14S1225 UNISTS=31136
PROTSIM     ORG=Homo sapiens; PROTGI=1070465; PROTID=PIR:TGHUM1; PCT=100; ALN=816
PROTSIM     ORG=Mus musculus; PROTGI=730933; PROTID=SP:Q08189; PCT=39; ALN=662
PROTSIM     ORG=Rattus norvegicus; PROTGI=135697; PROTID=SP:P23606; PCT=91; ALN=815
SCOUNT      24
SEQUENCE    ACC=M62925; NID=g339603; PID=g339604
SEQUENCE    ACC=M98447; NID=g186734; PID=g1256959
SEQUENCE    ACC=D90287; NID=g219631; PID=g219632
SEQUENCE    ACC=M55183; NID=g186789; PID=g186790
SEQUENCE    ACC=X57974; NID=g510524; PID=g510525
SEQUENCE    ACC=BF155997; NID=g11051180; LID=4808
SEQUENCE    ACC=AW083702; NID=g6038854; CLONE=IMAGE:2587766; END=3'; LID=728
SEQUENCE    ACC=AI652954; NID=g4736933; CLONE=IMAGE:2306445; END=3'; LID=698
SEQUENCE    ACC=BF155987; NID=g11051170; LID=4808
SEQUENCE    ACC=AI239574; NID=g3834971; CLONE=IMAGE:1846343; END=3'; LID=600
SEQUENCE    ACC=AW265414; NID=g6642230; CLONE=IMAGE:2754263; END=3'; LID=1370
SEQUENCE    ACC=BE182598; LID=3549
SEQUENCE    ACC=AI269864; NID=g3889031; CLONE=IMAGE:2005287; END=3'; LID=705
SEQUENCE    ACC=AW085789; NID=g6040941; CLONE=IMAGE:2588217; END=3'; LID=728
SEQUENCE    ACC=BF156003; NID=g11051186; LID=4808
SEQUENCE    ACC=AW194040; NID=g6472771; CLONE=IMAGE:2683894; END=3'; LID=760
SEQUENCE    ACC=AL039214; NID=g5408290; CLONE=DKFZp727C171; END=5'; LID=860
SEQUENCE    ACC=BE934356; NID=g10460432; LID=4595
SEQUENCE    ACC=AW796622; NID=g7848492; LID=2510
SEQUENCE    ACC=BE293065; NID=g9175931; CLONE=IMAGE:3349385; END=5'; LID=3594
SEQUENCE    ACC=AA583940; NID=g2368549; CLONE=IMAGE:1088673; END=3'; LID=567
SEQUENCE    ACC=NM_000359; NID=g4507474; PID=g4507475
SEQUENCE    ACC=BF155992; NID=g11051175; LID=4808
SEQUENCE    ACC=BF089798; NID=g10895508; LID=4808

