Bioinformatics Lesson.docx



Biotechnology I G/TName _________________KellyBioinformatics Day 2Understanding Genbank EntriesThe Genbank entry contains the code for a gene and information about the source of the sequenced material and what is known about the gene. Once a researcher has identified a useful Genbank entry, it is helpful to note the accession number. An accession number is like a call number on a library book. It can be used to reference and relate the entry to other entries in the database. Search Genbank (in the nucleotide database) to find an entry for dog (Canis lupus familiaris) MITF mRNA. Adjust the ‘Display Settings” at the top of the results to show 50 entries per page.1. Why are we searching for mRNA and not DNA? _______________________________________________ _________________________________________________________________________________________2. How many entries in Genbank contain domestic dog MITF mRNA? _________________________________3. What differences do you see in the entries that ‘match’ your search criteria? __________________________ ________________________________________________________________________________________________________________________________________________________________________________The MITF protein that is coded for by the MITF gene has several different forms, mainly due to alternative splicing. The various forms of this protein are referred to as isoforms; and the variations in mRNA that result from alternative splicing are called variants. The isoforms of MITF range in size from (approximately) 350 amino acids to 500 amino acids in length.4. What would the corresponding nucleotide range be? ___________________________________ How is this range determined? (Biology Review) ___________________________________________________________ _________________________________________________________________________________________5. Many of the sequence information in Genbank is the result of PCR amplification, in which the amplicons are often slightly larger than the gene of interest. Based on the length of the sequences, which entry from the nucleotide search results would be most appropriate for our investigation? ______________________________________________________________________________________6. What is the accession number for the entry you have selected? _____________________________________ Select this entry by checking the box in front of the entry and moving it to the clipboard as instructed.Open the entry by double clicking on the title. 7. When was this entry accepted into Genbank? __________________________________________________ How could this information be used by a researcher? ______________________________________________ ________________________________________________________________________________________8. Identify 3 classification categories for the organism found in this entry. (KPCOFGS)_________________________________________________________________________________________ _________________________________________________________________________________________9. From which breed of dog was this sample obtained? ____________________________________________10. On which chromosome is this gene located? ___________________________________________________ 11. What type of nucleic acid sequence is given in this entry? ________________________________________ 12. How many (nucleotide) bases long is the sequence in this entry? __________________________________Notice: Just below the entry title is the word FASTA. Clicking on this link will cover the nucleotide sequence into a format that is used to compare/align for sequence similarity. The formatting begins with a > (caret/ greater than symbol) then a title (no spaces) followed by the sequence of bases or amino acids on the proceeding line. Click FASTA. Highlight and select the FASTA file from caret thorough the last base in the sequence. Then copy and paste this text into a word document.Remove the text in the title (but keep the caret). Type the common name of the animal (for example: >DomesticDog or >Dog) NO Spaces (you can use an underscore). This formatting must be perfect or the searches we are about to use will not work. Save this document in your home directory or GAFE as “MITF FASTA Files”.Comparing the Gene of Interest Across SpeciesUsing the menu on the right side of the screen, select “Run BLAST”. Once the BLAST page appears, go to Choose Search Set section and select “Reference RNA Sequences (Refseq_RNA)” for the database. At the bottom of the page check the box to show the results in a new window. Click on the blue BLAST icon at the bottom of the page. The results could take a few minutes to appear, so be patient. Once the results of the BLAST are compiled, study the screen and make a mental note of the items it contains. At the top of the screen in the light blue rectangle will be a summary of the BLAST parameters. It shows the sequence (called query) which you are comparing to the database. Just below this region will be a large box with red lines in it. This graphic (Graphic Summary) shows a comparison of the sequence you are BLASTing to the other sequences in the database. Your query is at the top and the lines underneath are ordered from most similar to least similar. If you scroll over each line, you will see the title of the entry to which it corresponds. The next section on the page is the Descriptions section. This section lists the entries that align with your queried sequence, and gives an indication of how similar they are. 392049059690Define the terms to help interpret the alignments.Query Coverage shows how long the region of similarity is. Ident shows how close the similarity is.Which three animals that are not dog are most similar to your query? _________________________________________________________________________________________________________________________________Select 9 different animals plus the dog from the alignments list to investigate further. For each animal you will find the scientific name, common name, ident and taxonomy. Click on the accession number (on the right in the table) to link to the Genbank entry for each animal. The entry should contain the common name and taxonomy. If not, open another window and copy/paste the scientific name into a Google search or Wikipedia to get the common name. Also make an addition to the FASTA file to your FASTA document. Check the box in front of the entry title. Complete the chart for the 10 entries you selected including the ones you used for the question above. An example is done for you.Scientific NameCommon NameIdentTaxonomy12345678910Repeat the steps (as with the dog entry) to obtain the FASTA file, add it to your MITF FASTA Files document, and re-title the sequence with the animal’s common name. Alignment of Multiple Sequences to Analyze ConsensusDefine alignment ________________________________________________________________________________________________________________________________________________________What does it mean if we say a sequence or gene is conserved? ____________________________________________________________________________________________________________________What is a FASTA file? ______________________________________________________________To align and compare multiple sequences go to: 342900010795This website is part of the European Bioinformatics Institute. The Clustal Omega program aligns multiple sequences at once and can be use to analyze the conservation of nucleotide sequence across any species. 3086100285750 Select “DNA” in STEP 1, to indicate the type of sequence you will be aligning.Copy and paste the 10 FASTA formatted sequences from your MITF FASTA Files document into the space provided. Click on the SUBMIT icon. The results could take a few minutes to appear, so be patient.Once your alignment is complete, a user-interface will appear. Copy and Paste the URL for the complete alignment :Click on the Results Summary tab below the Results for job title. Click on the JalView icon. Decline to update the software, but say ok that this is trusted software. Then run. This will open a Java-based to analyze the alignment of your nucleotide entries and the conservation of the consensus sequence. Use the slider at the bottom of the screen to scroll across the entire alignment. 1. What do the colors within the sequence alignment indicate? ___________________ 2. At the bottom of the screen there is a long “consensus” sequence. What is a consensus sequence? __________________________________________________________________ _________________________________________________________________________________________3. Find one location at which all of your sequences agree. Write its location here _____4. Find one location where there is a wide variety of bases. Write its location here _____5. If the sequence was highly conserved which would you see more of? Same/ Different6. Find a gap in the middle of a string of conserved nucleotides where one sequence has a “-“. What does that mean? __________________________________________________________________________________________7. Use the FIND (under ‘Select’) function to locate the TAC code that is complementary to the start codon (AUG) near the beginning of the dog MITF sequence. Select a series of 10 to 15 bases upstream from this sequence to be a potential upstream primer for PCR. List that sequence here: Upstream Primer - _________________________________________________________________________. 8. How many other MITF samples from the mammal’s in your alignment could also use this same primer sequence? ______ What animals are they? _______________________________________________________ _________________________________________________________________________________________.9. Notice that the sequence for dog MITF end before many of the other mammal sequences in the downstream region. Scroll down to this region of the alignment or use the FIND function to the potential stop codon (TAG) in a region that is highly conserved downstream from the end of the dog MITF sequence. Select a series of 10 to 15 bases downstream of this region to be a potential downstream primer for PCR. List that sequence here: Downstream Primer - _______________________________________________________________________.10. A scientist wants to use the primers you have designed to identify the presence of the MITF gene in the Maned Wolf. The Maned Wolf is an endangered species found in South America that is closely related to domestic dogs, Gray Wolves, and Foxes. Would the primer sequences you’ve identified be useful to this scientist? Why or Why not? __________________________________________________________________________________________ _________________________________________________________________________________________.Creating a Phylogenetic Tree from Sequencing DataTab over to the Alignments page for your results. Click on Send to ClustalW2_Phylogeny .The input form will appear, click on the submit icon at the bottom of the page. When the results appear, scoll down to the bottom of the page to view the phylogenetic tree.What about the sequences allows the computer to create a tree to show relatedness? _______________________________________________________________________________________ 2. How does the phylogram show relatedness?__________________________________________________________________________________________ Use ‘Print Screen’ button, a Windows Snipping tool, or a web browser extension to copy and paste your phyologenetic tree (Cladogram) into this document. Below the tree respond to the last two questions.3. Cite three specific pair-wise comparisons shown on your tree and say:How does the tree agree with the older tree created by anatomical differences alone?How does it differ? 4. Does the MITF gene sequence information provide a reliable source of genetic information for determining the relatedness among species of mammals? Why might it not be reliable when used in isolation?____________________________________________________________________________________________________________________________________________________________________________________ ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download