COMPUTATIONAL GENE PREDICTION - University of California ...

[Pages:45]COMPUTATIONAL GENE PREDICTION

CSE/BIMM/BENG 181 MAY 24, 2011

SERGEI L KOSAKOVSKY POND [SPOND@UCSD.EDU]

DEFINITIONS

A gene: a nucleotide sequence that codes for a protein

Gene prediction: given a genome, locate the beginning and ending position of every gene.

aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcg gctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgg gatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttgga atatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagc tgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcg gctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgct aagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcgg ctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggct atgcaagctgggatccgatgactatgcttaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgct aagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaag ctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtct tgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttacctt ggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcg gctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgc taagctcatgcgg

CSE/BIMM/BENG 181 MAY 24, 2011

SERGEI L KOSAKOVSKY POND [SPOND@UCSD.EDU]

CENTRAL DOGMA OF

MOLECULAR BIOLOGY

CCTGAGCCAACTATTGATGAA

CCUGAGCCAACUAUUGAUGAA



PEPTIDE

CSE/BIMM/BENG 181 MAY 24, 2011

SERGEI L KOSAKOVSKY POND [SPOND@UCSD.EDU]

BRIEF HISTORY

"The central dogma of molecular biology deals with the detailed residueby-residue transfer of sequential information. It states that such information cannot be transfered from protein to either protein of nucleic acid". Francis Crick. Nature 1970

Originally stated in 1958, but questioned in the 1960s due to evidence of viral RNA to DNA transfer (shown by H. Temin and others)

CSE/BIMM/BENG 181 MAY 24, 2011

SERGEI L KOSAKOVSKY POND [SPOND@UCSD.EDU]

CODONS

In 1961 Sydney Brenner and Francis Crick discovered frameshifting mutations

Systematically deleted nucleotides from DNA

Single and double deletions dramatically altered protein product

Effects of triple deletions were minor

Conclusion: every triplet of nucleotides ? a codon ? maps to exactly one amino acid in a protein

CSE/BIMM/BENG 181 MAY 24, 2011

SERGEI L KOSAKOVSKY POND [SPOND@UCSD.EDU]

GENETIC CODE

64 codons are mapped to 20 (+stop) aminoacid characters via a genetic code

Genetic codes may differ slightly between organisms and genomes (e.g. nuclear vs mitochondrial)

Multiple and differing redundancies in the genetic code

Synonymous and non-synonymous substitutions are fundamentally different

Aminoacid Alanine Cysteine Aspartic Acid Glutamine Acid Phenylalanine Glycin Histidine Isoleucine Lysine Leucine Methionine Aspargine Proline Glutamine Arginine Serine Threonine Valine Tryptophan Tyrosine Stop

Codons GC* TGC,TGT GAC,GAT GAA,GAG TTC,TTT GG* CAC,CAT ATA,ATC,ATT AAA,AAG CT*,TTA,TTG ATG AAC,AAT CC* CAA,CAG AGA,AGG,CG* AGC,AGT,TC* AC* GT* TGG TAC,TAT TAA,TAG,TGA

Redundancy 4 2 2 2 2 4 2 3 2 6 1 2 4 2 6 6 4 4 1 2 3

CSE/BIMM/BENG 181 MAY 24, 2011

SERGEI L KOSAKOVSKY POND [SPOND@UCSD.EDU]

SIX READING FRAMES

HIV-1 protease

DNA: CCAATAAGTC CTATTGAAAC TGTACCAGTA ACAAAGCCAG GAATGGATGG

CCCAAAGGTT AAACAATGGC CATTAACAGA AGAGAAAAAA GC

Protein translation:

In frame: PISPIETVPVTKPGMDGPKVKQWPLTEEKK

+1:

QXVLLKLYQXQSQEWMAQRLNNGHXQKRKK

+2

NKSYXNCTSNKARNGWPKGXTMAINRREKS

X marks a stop codon which signals the ribosome to stop protein synthesis.

Reverse complements are complementary DNA strands (opposite direction and complementary bases)

They define 3 other reading frames

CSE/BIMM/BENG 181 MAY 24, 2011

SERGEI L KOSAKOVSKY POND [SPOND@UCSD.EDU]

CONTIGUOUS VS SPLICED GENES

Based on bacterial experimentation, the sequences of DNA, RNA and protein were collinear; evidence suggested that eukaryotes followed the same pattern.

In 1977, Phillip Sharp and Richard Roberts experimented with mRNA of hexon, a viral protein.

Map adenovirus hexon mRNA in viral genome by hybridization to adenovirus DNA and electron microscopy

mRNA-DNA hybrids formed three curious loop structures instead of contiguous duplex segment

CSE/BIMM/BENG 181 MAY 24, 2011



SERGEI L KOSAKOVSKY POND [SPOND@UCSD.EDU]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download