Determining Evolutionary Relatedness Using …
Determining Convergent Evolution with
Amino Acid Sequences of Hemoglobin and Prestin Genes
Instructor’s edition
Lauren Kennedy*, Jackie Doucette*, Trisha Buchanan*, and David Marcey, CLU Biology Department, 2012
* CLU Students in Biology 426, Molecular Biology
Objectives:
1. To understand the basic principles of convergent evolution.
2. To construct a cladogram based on hemoglobin protein sequences.
3. To construct a cladogram based on prestin protein amino acid sequences and to identify signatures of convergent evolution at the molecular level.
Overview:
Convergent evolution (CE) is the independent evolution of similar biological traits in different evolutionary lineages (clades). CE often results from similar selective pressures that drive the evolution of the specific trait in distantly related species.
One of the more interesting cases of convergent evolution at the molecular level is the evolution of echolocation in bats and dolphins and other cetaceans. The motor protein prestin endows sensitive and selective hearing in mammals. The prestin amino-acid sequences of echolocating dolphins have converged to resemble those of echolocating bats. As we know, bats and cetaceans are in completely different lineages. Based on BLAST searches that are done comparing hemoglobin sequences, students will construct a phylogenetic cladogram that places the hemoglobins of bats, dolphins, and other mammals in relative positions that reflect cladograms built on other derived traits. However, when BLAST searches are done on the prestin amino acid sequence, and a cladogram is constructed, students may deduce the prestin gene of bats and cetaceans shared a “common ancestor” more recently than other mammals, a result that contradicts their previous conclusion.
Dolphins and porpoises share at least 14 derived amino acid sites in prestin with echolocating bats. This example is one of the best showing convergent molecular evolution discovered to date because it is adaptive and the rapid convergence indicates positive selection. One of these potential selective pressures is the necessity to hear very high frequencies. These prestin studies identify probable selective pressures driving molecular convergence and emphasize the necessity of using caution when employing single character traits to derive evolutionary histories.
I. Retrieving HEMOGLOBIN Sequences from a Bioinformatics Database:
You will first begin by utilizing the data in this table. All protein sequences are given a unique identifier in the database known as an “Accession Number”. You will first obtain the Accession Number and protein sequence of the species listed in the following table:
Table 1: Accession Numbers
|Organism |Scientific Name |Accession Number |
|Horseshoe Bat |Rhinolophus ferrumequinum |ACC62118.1 |
|Sperm Whale |Physeter catodon | P09904.1 |
|Bottlenose Dolphin |Tursiops truncatus |P18990.1 |
|Cows |Bos taurus |NP_001070890.2 |
|Mole-rat |Heterocephalus glaber |EHB14142.1 |
|Humans |Homo sapien |NP_000549.1 |
|Dogs |Canis lupus |NP_001257814 |
1. Go to the NCBI: . NCBI stands for the National Center for Biotechnology Information.
2. You will be looking for the hemoglobin sequence of all 7 species. You will do the following instructions one at a time, repeating them 7 times, individually for each species.
3. Enter your query (search terms) in the following format. Using the Horseshoe bat as an example, use the scientific name:
hemoglobin AND Rhinolophus ferrumequinum
AND is called a boolean term and it makes the search contain all of the words. In your search results note that each entry has an Accession number, a unique identifier that specifies the databank and number of the entry in that databank.
4. For each species, once you locate a search result in which the Accession number enter that number in Table 1. Then click on the FASTA link below the Accession number. This will open a page that provides the amino acid sequence of the alpha hemoglobin protein from a particular species in a standard format, called FASTA. In bioinformatics, FASTA (“fast-all”) is a text-based format for representing either nucleotide sequences or protein sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences, designated with a “>”. For each species, copy the chosen FASTA sequence using your copy function and paste the function in NOTEPAD (not WORD!) or a similar text-only word processing document, e.g.:
>gi|183396443|gb|ACC62118.1| hemoglobin subunit alpha (predicted) [Rhinolophus ferrumequinum]
MVLSPSDKSNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQIKGHGKKVGDALTKAVGSIDDLAGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLACHNPGEFTPAIHASLDKFLASVSTVLTSKYR
In the above examples, note that the bold type represents the actual amino acid sequence, which should be on a different line than the header. For safety, remember to save your file after each sequence is pasted. It is suggested that you use the Horseshoe bat hemoglobin sequence as your first entered sequence. Our goal in this experiment is to determine which species’ hemoglobin amino acid sequences are most closely related to that of the horseshoe bat, and make a cladogram based on this information.
To make your analysis easier, replace the information in the header line with the common name of each species.
Using the above examples:
>Horseshoe bat
MVLSPSDKSNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQIKGHGKKVGDALTKA
VGSIDDLAGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLACHNPGEFTPAIHASLDKFLASVSTVLTSK
YR
II. Using ClustalW2 to compare HEMOGLOBIN protein sequences
1. Go to the following European site which provides many bioinformatics tools for protein and DNA alignment analysis. EBI stands for the European Bioinformatics Institute.
2. Click on Tools in the black bar.
3. In the drop down menu, click on Sequence Analysis.
4. On the side menu, click on ClustalW2. ClustalW2 is alignment tool. In the drop-down menu above the field where you will enter sequences, choose the “Protein” option.
5. You may choose to do one or both of the following methods of data entry for subsequent analysis.
1) Select all 7 protein sequences at once and paste them into the ClustalW2 sequence field.
2) Select the horseshoe bat hemoglobin sequence and one of the other species’ sequence and
paste them into the ClustalW2 sequence field.
6. Click run. The alignment can take a few minutes.
7. Scroll down to the protein sequence alignments.
8. Your alignment will look like this (except that you will have either 7 sequences aligned or two
sequences aligned, depending on which method you chose in step 5):
Horseshoe MVLSPSDKSNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHF-DLSHGSAQIKG- 58
Sperm VHLTGEEKSGLTALWAKV--NVEEIGGEALGRLLVVYPWTQRFFEHFGDLSTADAVMKNP 58
: *: .:**.:.* * ** *. * *.*** *::: :* *: :* ** *** ..* :*.
Horseshoe ----HGKKVGDALTKAVGSIDDLAGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLACHN 114
Sperm KVKKHGQKVLASFGEGLKHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVVVLARHF 118
**:** :: :.: :*:* *::::**:**..**:*** **:**.: *:*.** *
Horseshoe PGEFTPAIHASLDKFLASVSTVLTSKYR 142
Sperm GKEFTPELQTAYQKVVAGVANALAHKYH 146
**** :::: :*.:*.*:..*: **:
9. “Show Colors”. Sample Result:
Horseshoe MVLSPSDKSNVKAAWDKVGGNAGEYGAEALERMFLSFPTTKTYFPHF-DLSHGSAQIKG- 58
Sperm VHLTGEEKSGLTALWAKV--NVEEIGGEALGRLLVVYPWTQRFFEHFGDLSTADAVMKNP 58
: *: .:**.:.* * ** *. * *.*** *::: :* *: :* ** *** ..* :*.
Horseshoe ----HGKKVGDALTKAVGSIDDLAGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLACHN 114
Sperm KVKKHGQKVLASFGEGLKHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVVVLARHF 118
**:** :: :.: :*:* *::::**:**..**:*** **:**.: *:*.** *
Horseshoe PGEFTPAIHASLDKFLASVSTVLTSKYR 142
Sperm GKEFTPELQTAYQKVVAGVANALAHKYH 146
**** :::: :*.:*.*:..*: **:
From the ClustalW2 Help section, this is what the colors mean:
| |RED |Small (small+ hydrophobic (incl.aromatic -Y)) |
|DE |BLUE |Acidic |
|RK |MAGENTA |Basic |
|STYHCNGQ |GREEN |Hydroxyl + Amine + Basic - Q |
|Others |Gray | |
CONSENSUS SYMBOLS:
An alignment will display by default the following symbols denoting the degree of conservation observed in each column:
An "*" (asterix) means that the residues or nucleotides in that column are identical in all sequences in the alignment.
A ":" (colon) means that conserved substitutions have been observed, according to the COLOUR table above. Conserved mean that the change is with an amino acid which is chemically similar to another in the same classification.
A "." (period) means that semi-conserved substitutions are observed.
A “- “(dash) indicates a missing amino acid.
10. You may choose to save your results, by printing your alignments for your lab notebook. For examples, either use a screen capture tool or use the “Print Screen” key and paste it into a word processing document.
11. We are interested in those areas which have a “.” or “:” or “ “ (space) at the bottom or a “-“ in the sequence. They represent the places where differences in amino acid identity in a sequence have evolved. Subtle differences in the species have accumulated due to mutations of the DNA sequence. For convenience, we will note changes in protein sequence relative to the HORSESHOE BAT prestin sequence. By examining the alignments you have achieved, fill in the table on the next page by comparing each organism’ sequence to the horseshoe bat sequence at the top. Determine the number of amino acids that do not exactly match the horseshoe bat sequence.
Table 2: Similarities and differences in the amino acid sequences of hemoglobin and percentage of conserved amino acids of select organisms compared to the Horseshoe bat.
| |Comparative Organism |A. Number of amino acids |B. Number of amino acids |Percentage of Identical Amino Acids|
| | |matching |not matching | |
| | |Horseshoe Bat |Horseshoe Bat |(Column A / total amino acids * |
| | | | |100%) |
|Horseshoe Bat vs. |Sperm Whale | 54 |88 |38.03 |
| |Bottlenose Dolphin | 55 |87 |38.73 |
| |Cow |121 |21 | 85.21 |
| |Mole-rat | 14 |128 | 9.86 |
| |Human | 124 |18 | 87.32 |
| |Dog | 116 |26 | 81.69 |
12. Complete Table 2 by using the numbers gathered above and then calculating the percent conservation, of the percentage of amino acids that are the same and fall in the same position on the sequence for each organism compared to horseshoe bat.
Examine the table above. From the data gathered in this alignment and before proceeding to the next page, can you determine which species are most likely related? Which organisms are not? Explain your reasoning. MAKE A CLADOGRAM.
MAKING A CLADOGRAM FOR PHYLOGENY OF COMPARED SPECIES
From the ClustalW2 Help Section:
“A Phylogram is a branching diagram (tree) assumed to be an estimate of a phylogeny, branch lengths are proportional to the amount of inferred evolutionary change.
A Cladogram is a branching diagram (tree) assumed to be an estimate of a phylogeny where the branches are of equal length, thus cladograms show common ancestry, but do not indicate the amount of evolutionary "time" separating taxa.”
1. Under CLUSTALW Results, Click on “Guide Tree” and “Show as Cladogram Tree”
[pic]
III. Retrieving PRESTIN Sequences from a Bioinformatics Database:
You will first begin by utilizing the data in this table. All protein sequences are given a unique identifier in the database known as an “Accession Number”. You will first obtain the Accession Number and protein sequence of the species listed in the following table:
Table 3: Accession Numbers
|Organism |Scientific Name |Accession Number |
|Horseshoe bat |Rhinolophus ferrumequinum |ACI02071.1 |
|Sperm Whale |Physter catodon |ADE75013.1 |
|Bottlenose Dolphin |Tursiops truncatus |ADI59756.1 |
|Humpback Whales |Megaptera novaeangliae |ADE75011.1 |
|Cows |Bos taurus |NP_001179807.1 |
|Mole-rat |Heterocephalus glaber |EHB08960.1 |
|Humans |Homo sapien |AAP31417.1 |
|Dogs |Canis lupus |XP_540393.2 |
1. Go to the NCBI: . NCBI stands for the National Center for Biotechnology Information.
2. You will be looking for the prestin sequence of all 7 species. You will do the following instructions one at a time, repeating them 7 times, individually for each species.
3. Enter your query (search terms) in the following format. Using the Horseshoe bat as an example, use the scientific name:
prestin AND Rhinolophus ferrumequinum
AND is called a boolean term and it makes the search contain all of the words. In your search results note that each entry has an Accession number, a unique identifier that specifies the databank and number of the entry in that databank. Look for 741-744 aa.
4. For each species, once you locate a search result in which the Accession number enter that number in Table 1. Then click on the FASTA link below the Accession number. This will open a page that provides the amino acid sequence of the prestin protein from a particular species in a standard format, called FASTA. In bioinformatics, FASTA (“fast-all”) is a text-based format for representing either nucleotide sequences or protein sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences, designated with a “>”. For each species, copy the chosen FASTA sequence using your copy function and paste the function in NOTEPAD (not WORD!) or a similar text-only word processing document, e.g.:
>gi|205277608|gb|ACI02071.1| prestin [Rhinolophus ferrumequinum]
MDHAEETEILAATERYYVERPIFSHLVLQERLHKKDKISDSIGDKLKQAFTCTPKKIRNIIYMFLPITKW
LPAYNFKEYVLGDLVSGISTGVLQLPQGLAFAMLAAVPPVFGLYSSFYPVIMYCFFGTSRHISIGPFAVI
SLMIGGVAVRLVPDDIAVPGGVNATNGTEFRDALRVKVAMSVTLLAGIIQFCLGVCRFGFVAIYLTEPLV
RGFTTAAAVHVFTSMLKYLFGVKTKRYSGIFSVVYSTVAVLQNVKNLNVCSLGVGLMVFGLLLGGKEFNE
RFKEKLPAPIPLEFFAVVMGTGISAGFGLHESYNVDVVGTLPLGLLPPANPDTSLFHLVYVDAIAIAIVG
FSVTISMAKTLANKHGYQVDGNQELIALGLCNSTGSLFQTFAISCSLSRSLVQEGIGGKTQLAGCLASLM
ILMVILATGFLFESLPQAVLSAIVIVNLKGMFMQFSDLPFFWRTSKIELVIWLSTFVSSLFLGLDYGLIT
AVIIALMTVIYRTQSPTYTVLGQLPDTDVYIDIDAYEEVKEIPGIKIFQINAPIYYANSDLYSNALKRKT
GVNPSFILGARRKAMKKYAKEGGNINIANATDVKADAEVDAEDGTKPEEEEDEVKYPPVVIKSTFPEELQ
RFMPPLENVHTIILDFTQVNFIDSVGVKTLQGIVKEYGDVGIYVYLAGCSAQVVSDLTRNRFFENPALLD
LLFHSIHDAVLGSLVREALEEKEAAAATPQEDSEPNATPDV
In the above example, note that the bold type represents the actual amino acid sequence, which should be on a different line than the header. For safety, remember to save your file after each sequence is pasted. It is suggested that you use the Horseshoe bat prestin sequence as your first entered sequence. Our goal in this experiment is to determine which species’ prestin amino acid sequences are most closely related to that of the horseshoe bat.
5. To make your analysis easier, replace the information in the header line with the common name of each species. Using the above examples:
>Horseshoe Bat
MDHAEETEILAATERYYVERPIFSHLVLQERLHKKDKISDSIGDKLKQAFTCTPKKIRNIIYMFLPITKW
LPAYNFKEYVLGDLVSGISTGVLQLPQGLAFAMLAAVPPVFGLYSSFYPVIMYCFFGTSRHISIGPFAVI
SLMIGGVAVRLVPDDIAVPGGVNATNGTEFRDALRVKVAMSVTLLAGIIQFCLGVCRFGFVAIYLTEPLV
RGFTTAAAVHVFTSMLKYLFGVKTKRYSGIFSVVYSTVAVLQNVKNLNVCSLGVGLMVFGLLLGGKEFNE
RFKEKLPAPIPLEFFAVVMGTGISAGFGLHESYNVDVVGTLPLGLLPPANPDTSLFHLVYVDAIAIAIVG
FSVTISMAKTLANKHGYQVDGNQELIALGLCNSTGSLFQTFAISCSLSRSLVQEGIGGKTQLAGCLASLM
ILMVILATGFLFESLPQAVLSAIVIVNLKGMFMQFSDLPFFWRTSKIELVIWLSTFVSSLFLGLDYGLIT
AVIIALMTVIYRTQSPTYTVLGQLPDTDVYIDIDAYEEVKEIPGIKIFQINAPIYYANSDLYSNALKRKT
GVNPSFILGARRKAMKKYAKEGGNINIANATDVKADAEVDAEDGTKPEEEEDEVKYPPVVIKSTFPEELQ
RFMPPLENVHTIILDFTQVNFIDSVGVKTLQGIVKEYGDVGIYVYLAGCSAQVVSDLTRNRFFENPALLD
LLFHSIHDAVLGSLVREALEEKEAAAATPQEDSEPNATPDV
IV. Using ClustalW2 to compare protein sequences
1. Go to the following European site which provides many bioinformatics tools for protein and DNA alignment analysis. EBI stands for the European Bioinformatics Institute.
2. Click on Tools in the black bar.
3. In the drop down menu, click on Sequence Analysis.
4. On the side menu, click on ClustalW2. ClustalW2 is alignment tool. In the drop-down menu above the field where you will enter sequences, choose the “Protein” option.
5. You may choose to do one or both of the following methods of data entry for subsequent analysis.
1) Select all 7 protein sequences at once and paste them into the ClustalW2 sequence field.
2) Select the horseshoe bat prestin sequence and one of the other species’ sequences and paste them into the ClustalW2 sequence field.
6. Click run. The alignment can take a few minutes depending on how many people all over the world are doing alignments at that time.
7. Scroll down to the protein sequence alignments.
8. Your alignment will look like this (except that you will have either 7 sequences aligned or two sequences aligned, depending on which method you chose in step 5):
Horseshoe MDHAEETEILAATERYYVERPIFSHLVLQERLHKKDKISDSIGDKLKQAFTCTPKKIRNI 60
Bottlenose MDHVEETEILAATQRYYVERPIFSHPVLQERLHKKDKISESIGDKLKQAFTCTPKKIRNI 60
***.*********:*********** *************:********************
9. Click “Show Colors”. Sample Result:
Horseshoe MDHAEETEILAATERYYVERPIFSHLVLQERLHKKDKISDSIGDKLKQAFTCTPKKIRNI 60
Bottlenose MDHVEETEILAATQRYYVERPIFSHPVLQERLHKKDKISESIGDKLKQAFTCTPKKIRNI 60
***.*********:*********** *************:********************
From the ClustalW2 Help section, this is what the colors mean:
| |RED |Small (small+ hydrophobic (incl.aromatic -Y)) |
|DE |BLUE |Acidic |
|RK |MAGENTA |Basic |
|STYHCNGQ |GREEN |Hydroxyl + Amine + Basic - Q |
|Others |Gray | |
CONSENSUS SYMBOLS:
An alignment will display by default the following symbols denoting the degree of conservation observed in each column:
An "*" (asterix) means that the residues or nucleotides in that column are identical in all sequences in the alignment.
A ":" (colon) means that conserved substitutions have been observed, according to the COLOUR table above. Conserved mean that the change is with an amino acid which is chemically similar to another in the same classification.
A "." (period) means that semi-conserved substitutions are observed.
A “- “(dash) indicates a missing amino acid.
10. You may choose to save your results, by printing your alignments for your lab notebook. For examples, either use a screen capture tool or use the “Print Screen” key and paste it into a word processing document.
11. We are interested in those areas which have a “.” or “:” or “ “ (space) at the bottom or a “-“ in the sequence. They represent the places where differences in amino acid identity in a sequence have evolved. Subtle differences in the species have accumulated due to mutations of the DNA sequence. For convenience, we will note changes in protein sequence relative to the HORSESHOE BAT prestin sequence. By examining the alignments you have achieved, fill in the table on the next page by comparing each organism’ sequence to the horseshoe bat sequence at the top. Determine the number of amino acids that do not exactly match the horseshoe bat sequence.
Table 4: Similarities and differences in the amino acid sequences of prestin and percentage of conserved amino acids of select organisms compared to the Horseshoe bat.
| |Comparative Organism |A. Number of amino acids |B. Number of amino acids |Percentage of Identical Amino Acids |
| | |matching Horseshoe bat |not matching Horseshoe bat| |
| | | | |(Column A / total amino acids * 100%)|
|Horseshoe bat vs. |Sperm Whale | 695 |46 | 93.8% |
| |Bottlenose Dolphin | 699 | 42 | 94.3% |
| |Humpback whale | 694 | 47 | 93.7% |
| |Cow | 691 | 50 | 93.3% |
| |Mole-rat | 665 | 76 | 89.7% |
| |Human | 680 | 61 | 91.8% |
| |Dog | 685 | 56 | 92.4% |
12. Complete Table 2 by using the numbers gathered above and then calculating the percent conservation, of the percentage of amino acids that are the same and fall in the same position on the sequence for each organism compared to horseshoe bat.
Examine the table above. From the data gathered in this alignment and before proceeding to the next page, can you determine which species most likely convergently evolved with the horseshoe bat? Which organism clearly did not? Explain your reasoning.
MAKING A CLADOGRAM FOR CONVERGENT EVOLUTIOM OF SPECIES
From the ClustalW2 Help Section:
“A Phylogram is a branching diagram (tree) assumed to be an estimate of a phylogeny, branch lengths are proportional to the amount of inferred evolutionary change.
A Cladogram is a branching diagram (tree) assumed to be an estimate of a phylogeny where the branches are of equal length, thus cladograms show common ancestry, but do not indicate the amount of evolutionary "time" separating taxa.”
Under CLUSTALW Results, Click on “Guide Tree” and “Show as Cladogram Tree”
[pic]
Conclusion:
Convergent Evolution is clearly shown with this example of prestin evolution in both bats and cetaceans. The cladograms show how comparison of hemoglobin sequences, present in all mammals, yields a cladogram consistent with known evolutionary phylogenies. However, with prestin sequences, molecular homoplasy is revealed because comparison of these yields a cladogram inconsistent with a broad empirical fundament. See:
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- determining statistical significance
- determining applicability in research
- determining statistical significance in excel
- determining your why
- determining which statistical test to use
- determining significant difference
- determining the level of significance
- determining significance in statistics
- determining your why activities
- determining percent difference
- determining statistical significance p value
- determining statistical significance spss