Supplement of Materials and Methods -cdn.com



Supplement of Materials and MethodsConservation analysisData retrievalIn protein database of NCBI, using the following searching words:“Hepatitis E virus”[Organism] NOT “partial”[title] AND(“630”[SLEN] : “690”[SLEN])Then save all sequences as a fasta file.Check redundencyUse the following script provided by BioPython (We have packed this file into supplementary data and named it as “SequenceCleaner.py”):import sysfrom Bio import SeqIOdef sequence_cleaner(fasta_file, min_length=0, por_n=100): sequences={} for seq_record in SeqIO.parse(fasta_file, "fasta"): sequence = str(seq_record.seq).upper() if (len(sequence) >= min_length and (float(sequence.count("N"))/float(len(sequence)))*100 <= por_n): if sequence not in sequences: sequences[sequence] = seq_record.id else: sequences[sequence] += "_" + seq_record.id with open("clear_" + fasta_file, "w+") as output_file: for sequence in sequences: output_file.write(">" + sequences[sequence] + "\n" + sequence + "\n") print("CLEAN!!!\nPlease check clear_" + fasta_file)userParameters = sys.argv[1:]try: if len(userParameters) == 1: sequence_cleaner(userParameters[0]) elif len(userParameters) == 2: sequence_cleaner(userParameters[0], float(userParameters[1])) elif len(userParameters) == 3: sequence_cleaner(userParameters[0], float(userParameters[1]), float(userParameters[2])) else: print("There is a problem!")except: print("There is a problem!")Save this script as an individual file that Python could run, run this script with python to remove the redundancy of sequences. Typical command is as following:~$ python SequenceCleaner.py [Seq]Where [Seq] is the file name of your sequences. Our running environment includes: Linux (Ubuntu 17.10), Python 3.6, and BioPython 1.7.Conservation analysisAs we did sequence alignment with Linux environment, we have to use CLUSTALW with mode of command line input. Typical command of CLUSTALW on Linux is:~$ clustalw –INFILE=[Seq]Where [Seq] is the file name of your sequences. As the Windows version of MEGA package is more common, we do not strongly recommend our method as the primary choice unless dealing with large data sets.Aligned sequences can be directly uploaded to the server of ConSurf. Most operations can be performed automatically, e. g., the algorithm has chosen the JTT evolution mode . However, the following parameters are mandatory:Bayesian (Generally recommended) or Maximum Likelihood Mode;Query sequence(we recommend the sequence in your dataset with typical amino acid length and close to the root of the phylogenetic tree as possible).Visualization of conservation analysisThe key sentences of Python script for data visualization in Figure 1 are as following:Load pyplot module:>>> import matplotlib.pyplot as pltPlot bars:>>> plt.bar(x, height, width = width, bottom = bottom, color = color, alpha = alpha)Where x is the x-coordinate of the bar, height is the height of bar, bottom is the y-coordinate of the bottom of the bar, color is represented by hexadecimal number (e.g., “#FFFFFF” for white), alpha represents the transparency of the bars.Plot scatters:>>> plt.scatter(x, y, c = color, edgecolor = edgecolor)Plot pie chart:Load pandas module:>>> import pandas as pdTransform data to series:>>> data = pd.Series([a1,a2,a3,…,an], index = [‘A’, ‘B’, ‘C’, …, ‘N’])Plot pie chart:>>> data.plot(kind = ‘pie’, fontsize = fontsize, colors = colors, explode = explode)Protein structure modelling and alignmentAs I-TASSER is a completely automated server, researcher can simply upload the interesting sequence as fasta format. Results will be packed into a “*.tar.gz” file. For details please notice the instruction of author’s documentation.Protein structure alignmentIn order to facilitate large-scale protein structure alignment, we have extracted the domains in PDB files and save them as individual PDB files. We do not recommend our method, for investigators can easily do similar operations with PyMol.Typical command of alignment with DeepAlign is as following:~$ ./DeepAlign protein_A protein_Bwhere protein_A and protein_B are PDB files of structures you want to align. As DeepAlign provides a useful tool for calculating RMSD of multiple structures, we use it to compare the morphological features between all the domains of reference sequences. Typical command of morphology align with DeepAlign is as following:~$ ./3DCOMB –i input_list -rwhere input_list is a file containing all the PDB file names you want to align. The parameter -r means the calculation will be running under “iteration refinement” mode.Docking of antigen and antibodyIdentification of interface between antigen and antibodyCopy the following script provided by PyMol, and save it as an individual *.py file (We have packed it as “InterfaceResidues.py” into supplementary data).from pymol import cmd, storeddef interfaceResidues(cmpx, cA='c. A', cB='c. B', cutoff=1.0, selName="interface"):??? oldDS = cmd.get("dot_solvent")??? cmd.set("dot_solvent", 1)??? ??? tempC, selName1 = "tempComplex", selName+"1"??? chA, chB = "chA", "chB"??? ??? cmd.create(tempC, cmpx)??? cmd.disable(cmpx)??? ??? cmd.remove(tempC + " and not (polymer and (%s or %s))" % (cA, cB))??? ??? cmd.get_area(tempC, load_b=1)??? cmd.alter(tempC, 'q=b')??? ??? cmd.extract(chA, tempC + " and (" + cA + ")")??? cmd.extract(chB, tempC + " and (" + cB + ")")??? cmd.get_area(chA, load_b=1)??? cmd.get_area(chB, load_b=1)??? ??? cmd.alter( "%s or %s" % (chA,chB), "b=b-q" )??? stored.r, rVal, seen = [], [], []??? cmd.iterate('%s or %s' % (chA, chB), 'stored.r.append((model,resi,b))')??? cmd.enable(cmpx)??? cmd.select(selName1, 'none')??? for (model,resi,diff) in stored.r:??? ??? key=resi+"-"+model??? ??? if abs(diff)>=float(cutoff):??? ??? ??? if key in seen: continue??? ??? ??? else: seen.append(key)??? ??? ??? rVal.append( (model,resi,diff) )??? ??? ??? cmd.select( selName1, selName1 + " or (%s and i. %s)" % (model,resi))??? cmd.select(selName, cmpx + " in " + selName1)??? cmd.delete(selName1)??? cmd.delete(chA)??? cmd.delete(chB)??? cmd.delete(tempC)??? cmd.enable(selName)??? ??? cmd.set("dot_solvent", oldDS)??? ??? return rValcmd.extend("interfaceResidues", interfaceResidues)Load crystal structure in PyMol and run the script above, then type command in PyMol command line as following:PyMOL> interfaceResidue [PDB], chain [A], chain [B]where [PDB] is the file name of structure you loaded in PyMol, and chain [A] and chain [B] are chains of contacted structures whose interface you want to identify. After running the script above, PyMol will define all atoms found as a new “selection”, then remove all the atoms no more than 10 Angstroms distant from the interface with the following command in PyMol command line:PyMOL> remove [sele] round 10where [sele] is the “selection” in PyMol defined by the script above. Extract remaining structures of antigen and antibody, save them as PDB files. These two files will be the masked structure for antigen-antibody docking.Antigen-antibody dockingExtract structure of antigen and antibody from co-crystalized structure with PyMol and save them as PDB files. Upload them to CluPro server, and run with antigen-antibody mode. Provide corresponding “masked” structures above in case of improper docking.Plotting information on PDB models with PyMolBasic OperationLabel information on PDB models with PyMol, for example:PyMOL> color red, resi 123Color the No.123 amino acid as red.Or render a picture of structure, for example:PyMOL> bg_color whitePyMOL> ray 2500, 2500PyMOL> png A.pngRender a picture as a resolution of 2500 pixels times 2500 pixels and white color as background, and save it as A.png.PyMOL> remove resi -110Remove amino acids from the beginning to the No.110 residue.Example: Plotting conservation grades on PDB modelsStep 1Set working directory and load PDB files. You can use PyMol menu :File > Working Directory > Change…or use PyMol command line (With the example command, the working directory will be “/home/user/desktop”):PyMOL> cd /home/user/desktopStep 2Copy PDB files to the working directory. Load model with PyMol menu:File > Open…or use PyMol command line ( With the example command, PyMol will load a model named “1A.pdb”, Note: PyMOL is created by Python, which distinguishes filenames with upper and lower cases as different files. )PyMOL> load 1A.pdbStep 3Set color scale to represent conservation grade. PyMol uses decimals to represent colors, we have transform the default color scheme of ConSurf into following script. It can be stored as a TXT file and load with menu File > Run Script… (we have packed it as “SetColors.txt”).set_color grade1 = [0.05859375,0.77734375,0.80859375]set_color grade2 = [0.55859375,0.99609375,0.99609375]set_color grade3 = [0.8125,0.99609375,0.99609375]set_color grade4 = [0.875,0.99609375,0.99609375]set_color grade5 = [0.99609375,0.99609375,0.99609375]set_color grade6 = [0.99609375,0.90625,0.9375]set_color grade7 = [0.9375,0.77734375,0.87109375]set_color grade8 = [0.93359375,0.46875,0.625]set_color grade9 = [0.62109375,0.125,0.37109375]Step 4Load script to color residues. The following command will color the first residue with our previously defined color grade3:PyMol> color grade3, resi 1We have generated all the command for 660 sites, and packed it with PDB files in Supplements. Use PyMol menu File > Run Script… to load the script “ConservationPlot.txt”, 660 residues can be all colored immediately. The script is in form:color grade3, resi 1color grade1, resi 2color grade1, resi 3color grade1, resi 4color grade1, resi 5……color grade3, resi 656color grade4, resi 657color grade5, resi 658color grade7, resi 659color grade1, resi 660 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download