BLAST exercises - Saint Louis University



Exercise 3: BLAST Database Searches and Pairwise alignmentsName:Due date: Friday May 29th at 4:00 pmPurpose:To become familiar with the different programs available at NCBI Blast.To learn how to identify possible homologues using sequence similarity.Use Primer-BLAST to design primers for a specific region of a transcriptRemember: you should have a narrative for each section of the exercise.Background:NCBI Blast: notes on sequence similarity and alignmentsBLAST tutorialsActivities:Database searches using blastn, blastp and primer-blast.Premise for the exercise:In your list from exercise 2 the mostly highly expressed gene in cells infected with SARS-CoV2 is MX1. Based on the simplistic assumption that the most highly expressed gene in the SARS-CoV2 treated cells is a gene of interest, you decide to design primers for qRT-PCR to monitor MX1 expression and to express the protein for making antibodies. You collaborate with a colleague at another university who already has an expression construct for MX1 and has agreed to send it to you. This gene has several transcript isoforms, one of which produces a smaller protein. For this exercise, you will focus on the two isoforms outlined in red in the figure below. NM_001144925 is transcript isoform 1 and codes for NP_001138397 while NM_001282920 is transcript isoform 4 and encodes NP_001269849. 63576200MX1 transcripts from the MX1 Gene record:0MX1 transcripts from the MX1 Gene record:229514947000Exercises:3-1) Using BLASTN to identify a sequence of unknown originYour colleague has sent you an E. coli expression construct that he says contains the open reading frame for the human MX1 protein encoded by transcript isoform 1. You want to express this protein in E. coli so you can make an antibody to the encoded protein. However, past experience with this colleague suggests you should verify the construct before trying to do the expression. You run a few diagnostic restriction digests based on the data from the plasmid map, but the results do not make sense. You then use your local sequencing service to sequence the part of the cloned construct. The sequence comes back from GeneWiz. Using NCBI BLAST, try to identify the most likely source of the DNA sequence in this construct. As part of your narrative, answer the following questions:What databases did you use for the search(es)?What was the top match in the database for each query sequence?What is the most likely animal (taxonomic) source for this sequence, including common name if available?Does this sequence encode a protein? If so, which one?As part of your report, you should include the query, query length and databases searched, including limits. In describing the top matches, include the accession number, short description, E-value, percent identity and percent query coverage to support your conclusions. 3-2) Using BLASTP to determine taxonomic distribution of a protein sequence.MX1 is the mostly highly up-regulated gene in the comparison of lung adenocarcinoma cells with and without infection by SARS-CoV2 set of genes. This is a poorly characterized protein in human and mouse. Your goal with this part of the exercise is to determine if there are homologs of the human MX1 protein transcript variant 1 in the two different model organisms: Drosophila melanogaster and Caenorhabditis elegans. Prepare a table that includes the following information:Refseq protein accession (NP_XXXXXX) for queryquery lengthdatabase searched (with taxonomy limits)Accession number for top match E-valueAlignment length and total protein length of top matchPercent identities & positives of top match.Questions that should be considered as part of the narrative:Based on a pairwise comparison of the top match in each taxonomic group, would you consider them to be a homolog? What is your evidence for this conclusion?If the human protein has known protein domains, do the top matches in the other taxonomic groups have the same domains? If not, how do the three potential homologs differ from each otherNOTE: You could also include screen shots of the domain organization of the different proteins to support your conclusions3-3) Identifying primer locations using Primer-BLASTA colleague sends you two sets of primers that he says will only amplify the human transcript variant 1 for the MX1 gene. He says the both sets of primers will give a product size of ~300 bp.Pair1:Forward primer: GCCTTCCGATTCCCCATTCAReverse primer: GCTCTGGGTGTACCTCTGACPair 2:Forward primer: AAGAGCCGGCTGTGGATATGReverse primer: GCGGATCAGCTTCTCACCTTFrom past experience working with your colleague, you want to confirm that these primer pairs:Will bind only to MX1 transcript variant 1 and give the product size expectedWill not bind or amplify other MX1 transcripts and will not bind and amplify any other non-MX1 genes.You should include in your narrative or report the Refseq mRNA accession numbers of the reference transcripts (not XM- or XR_ transcripts) for MX1 transcript variants, the processed mRNA lengths and length of the protein encoded by each variant. Questions that should be considered as part of the narrative:Describe what tools and websites you used to answer these questions.Where in human MX1 transcript variant 1 did the primer pairs bind? What is the predicted size of the PCR product for each of the primer pairs? Will they bind to any other MX1 transcript variants?If you used these same primer pairs in a PCR reaction with human genomic DNA, what is the expected size? What do the relative sizes of the transcript and genomic products tell you about the where in the gene these primers bind? Extra credit (15 points): Design a single pair of primers to distinguish transcript isoformsAnother gene this is up-regulated in response to SARS-CoV2 infection of lung cells is CMPK2, cytidine/uridine monophosphate kinase 2. There are at least two transcript variants of CMPK2: transcript variant 1 (NM_207315) is 3014 bp while transcript variant 3 (NM_001256478) is 1241 bp long. Your goal is to design a pair of PCR primers that will distinguish the transcript isoforms in a PCR assay.For the report provide: Description of how the two transcript isoforms differ including an alignment or graphic showing the differencesSequences of the primer pair, where they bind and expected product size for the two isoformsQuestions to answer as part of a narrative:Describe what web-based tools and databases you used to design these PCR primers.Are the product sizes expected for the primers used with the two transcript isoforms sufficiently different that you are confident that you can distinguish them on a gel?What is the product size of the two primers if you used them with genomic DNA? ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download