Vlad Tepes



Summary of some papers about SARS-CoV-2HIV-1 sequences in SARS-CoV-2Definitions: The spike protein forms spikes on the surface of corona viruses that facilitate their attachment to host cells. It is actually a glycoprotein (protein with a sugar group attached). Coronaviruses as a category get their name (corona meaning “crown”) from these spike proteins that project from their surface.The spike glycoprotein (S) of cornonavirus is cleaved into two subunits (S1 and S2) when it binds to a receptor on a host cell. The S1 subunit helps in binding to receptors in the host cell and the S2 subunit facilitates membrane fusion. The S1 subunit contain the receptor binding domain (RBD), part of which is the receptor binding motif (RBM) that makes contact with the ACE2 receptor, the primary receptor for SARS-CoV-2 and also SARS CoV. gp120 is a glycoprotein on the surface of the HIV virus that is critical to its attachment to specific surface cell receptors. Gag is an HIV protein necessary for the assembly of virus-like particles and virion maturation after particle release and early post-entry steps in virus replication. Virion means the active, infectious form of a virus. Virion is the vector stage of a virus, which allows the transmission of a virus from an infected host cell to another host cell. Virus is the extracellular phase whereas virion is the intracellular phase of the virus.N-terminal and C-terminal: Amino acids by definition have an amino group (N) and a carboxylic acid group (COOH) attached to a carbon called the alpha-carbon. When two amino acids bond (peptide bond), the COOH group of one bonds with the NH2 group of the other. In forming the bond, the COOH group involved loses an OH and the NH2 group involved loses an H, releasing a molecule of H20. Therefore any chain of amino acids (called polypeptides, or if they’re really long, proteins) has an N-terminal and a C-terminal. By convention, the amino acid at the N-terminal is in position number 1 and counting of amino acids in a polypeptide or protein always begins at the N-terminal. In addition to have three-letter abbreviations, there are one-letter abbreviations for amino acids, which can be seen here. O-linked glycosylation: O?-linked?glycosylation?is the attachment of a sugar molecule to the oxygen atom of serine (Ser or S) or threonine (Thr or T) residues in a protein.?O?-glycosylation?is a post-translational modification that occurs after the protein has been synthesised.Things like this, S673, T678 and S686: That refers to amino acids in specific places in a protein, i.e., serine at positions 673 and 686, and threonine at position 678. Epitope: The part of an antigen that is recognized by the immune system, specifically by antibodies, B cells, or T cells. The epitope is the specific piece of the antigen to which an antibody binds.What the Indian scientists (Pradhan et al., 2020) reported:Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag January 31st, a group of Indian scientists (Pradhan et al.) pre-published online (i.e.,. without peer review) a paper called “Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag.” At the time the Indians published, the coronavirus did not yet have its current name (SARS-CoV-2) and they refer to it by what it was called at the time: 2019-nCoV. The scientists were trying to figure out where the 2019-nCoV virus came from. They retrieved the nucleic acid sequences of all coronaviruses that had been sequenced and whose genomes were published in databases. There were 55 of such genomic sequences available. They also retrieved all the sequences that had been published so far on the new coronavirus, then called 2019-nCoV. There were 28 of those available (i.e., at the time there were 28 strains of the new virus).[Note: There would be different strains of the new virus because of random mutations that occur as it spreads. The changes in a few of the nucleic acids in its RNA genome would not be sufficient to make it a new virus, just a different strain of the same virus. The genome of SARS-CoV-2 is about 30,000 nucleic acids long and the various strains would differ in about a dozen nucleic acids. At the time the Indians published their paper, there were 28 strains.]Of the 55 coronavirus genomes available, the scientists used 32 genomes from all categories of virus to develop a phylogenetic tree (i.e., a “family tree” showing similarities and relationships of viruses). They found the closest relative to be the SARS CoV virus (which caused the pandemic of 2002-03).Then they aligned and compared the glycoprotein region of the SARS CoV and the new virus 2019-nCoV. They compared both the nucleic acid sequence and the amino acid sequence (a “codon” of three nucleic acids codes for an amino acid). They also looked at how conserved the nucleic acid and amino acid sequences were in the 28 strains of the new 2019-nCoV virus. They also used software to generate the three-dimensional structure of the 2019-nCov glycoprotein. What they found is that, compared to the SARS CoV virus, the 2019-nCov virus contained four insertions (i.e., sequences of nuclei and amino acids that were not present in the SARS CoV virus). They then compared the sequence alignment of amino acids of the spike glycoprotein of 2019-nCov with the amino acid sequence for all 55 coronaviruses for which that information was available. They found the four insertions, which they labelled insertions 1, 2, 3, and 4, to be unique to 2019-nCoV and not present in any other coronavirus. They mentioned that a group from China (Zhou et al., 2020) had documented three insertions, but that group had not looked at the entire glycoprotein sequence. When they compared the nucleotide sequences of the spike glycoprotein of all 28 available 2019-nCov sequences, they found that all four insertions were absolutely (100%) conserved. They then translated the aligned genome [into what the amino acid sequence would be] and found that the inserts were present in all Wuhan 2019-nCov viruses except for the 2019-nCoV virus of Bat as a host. They wanted to find out the origin of the inserts. So they used the 2019-nCov virus with each insert as a “query” against all virus genomes and considered “hits” with 100% sequence coverage. They were surprised to find each of the four inserts aligned with short segments of HIV-1 proteins. The first three inserts aligned with the HIV-1 protein gp120 and the fourth aligned with the HIV-1 protein Gag. Insert 1 and insert 2 were each six amino acids long and there was a 100% sequence alignment with six proteins in the HIV gp120 protein without any gaps. Insert 3 was 12 amino acids long and it aligned to a sequence in gp120 with a gap (the gp120 sequence was 15 amino acids long; if one left a three-amino acid gap in insert 3 between the fourth and fifth amino acids to account for the three additional amino acids in the HIV sequence, then 9 of the 12 amino acids in insert 3 and the gp120 sequence were identical). Insert 4 had eight amino eight amino acids that were identical to a sequence in the HIV Gag protein with a gap (the Gag sequence was 19 amino acids long; if you left a gap of 11 amino acids in Insert 4 between amino acids 4 and 5 to account for the gap, the eight amino acids were 100% identical to the first four and last four amino acids in the Gag sequence). [The amino acid sequence of insert number 4 is QTNSPRRA, in positions 681 to 688 of the spike protein.]The authors note that while it’s not unusual to find short stretches of amino acid similarity in unrelated proteins, they thought it unlikely that all four inserts in the 2019-nCoV spike glycoprotein would match two key structural proteins of an unrelated virus like HIV-1. To understand how these insertions would relate to the structure of the spike protein, they modelled its structure based on the available structure of the SARS spike protein. Although inserts 1, 2, and 3 were not contiguous in the amino acid sequence of the protein, when the protein is in its 3-dimensional shape, they fold to constitute the part of the glycoprotein binding site that recognizes the host receptor. Insert 1 corresponds to the N-terminal domain and inserts 2 and 3 correspond to the C-terminal domain of the S1 subunit of the 2019-nCov spike glycoprotein. [All proteins have an N-terminal (amino terminal) and C-terminal (carboxylic acid terminal), so this means that insert 1 was at one end of the S1 subunit and inserts 2 and 3 were at the other end of the S1 subunit of the spike protein. Recall that the spike protein (S) is cleaved into S1 and S2 subunits as it enters the cell, and S1 is needed for binding to the receptor.] Insert 4 was at the junction of the SD1 and SD2 (sub-domains 1 and 2) parts of the S1 subunit. They speculate that these insertions provide additional flexibility to the glycoprotein binding site by forming a hydrophilic loop that may facilitate or enhance virus-host interactions.Pradhan et al. (2020) withdrew their paper within days of prepublication after it was criticized for methodological reasons. ******************Another pre-published (not yet peer-reviewed) paper by Tong Meng et al. (2020) says that the insert sequence in SARS CoV-2 enhances the cleavage of the spike protein needed for infection. The insert sequence in SARS-CoV-2 enhances spike protein cleavage by TMPRSS[Background information: Coronaviruses use the spike (S) protein to enter target cells. The surface subunit, S1, of the S protein binds to a cellular receptor. Entry also requires S protein “priming” by cellular proteases, which cleave the S protein at specific sites, called S1/S2 and S2’, which allows the fusion of the viral and host cell membranes. The fusion process is driven by the S2 subunit. Both SARS CoV and SARS CoV-2 use the ACE2 receptor as their entry receptor and the cellular serine protease TMPRSS2 for S protein priming.] As the title suggests, there is an “insert sequence” in SARS CoV-2 that enhances protein cleavage by TMPRSS. That insert sequence turns out to be part of Pradhan et al.’s insert 4.These authors state that an insertion sequence in the spike protein of SARS CoV-2 enhances the cleavage efficiency. In addition to pulmonary alveoli, intestinal and esophageal epithelia were also the target tissue of SARS-CoV-2. While SARS CoV-2 uses the same ACE2 receptor as SARS CoV and has a similar binding affinity, it is more transmissible and infective. These differences could be associated with the differences in protease-induced S protein cleavage between SARS CoV-2 and SARS CoV. The authors compared the S1/S2 and S2’ cleavage sites of SARS CoV-2 and SARS CoV. They found the main differences to be three short insertions in the N-terminal domain and four out of five key amino acid changes in the receptor-binding motif. Compared with SARS, there was an insertion sequence SPRR (serine, proline, arginine, arginine) in the S1/S2 cleavage site of SARS CoV-2. [SPRR was part of insert 4 of Pradham et al. It is amino acids 4 to 7 of the eight amino acid-long insert 4.] The “furin score” was used to identify the cleavage efficiency of the insertion sequence in SARS CoV-2. Its furin score was 0.688, much higher than that of the corresponding sequence for SARS CoV of 0.139. [Note: Furin is a membrane-bound protease expressed in many tissues which cleaves precursor proteins and facilitates their conversion to a biologically active state. The increased ability of furin to cleave the S1/S2 site in SARS CoV-2 compared to SARS CoV seems to be related to the SPRR sequence. The cleavage by furin greatly increases ability of the S1 subunit of SARS CoV-2 to bind to the ACE2 receptor, which it has to do to get into the cell.]The authors also found that the insertion sequence formed an exposed loop at the S1/S2 site of SARS CoV-2 which was “easily recognized by the catalytic pocket of TMPRSS2. Thus, both the furan score and molecular docking revealed that the insertions sequence of SARS CoV-2 facilitates the TMPRSS2 recognition and S protein cleavage.” The authors concluded that the receptor binding domain of the S protein in both viruses had a similar affinity to the ACE2 receptor, but that the specific structure of the SARS CoV-2 S protein was better suited to be activated by host cell proteases which may be related to the different infectivities and transmissibilities of the two viruses. Also more R (R682, R683, R685) [this refers to the amino acid arginine in those positions; the first two of those arginines are in insert 4] in the S1/S2 cleavage site of SARS CoV-2 can enhance the cleavage of S1 from S2, which means the structural constraints of S1 on S2 are removed, and the fusion peptides in S2 are exposed and insert into the target host cell membrane more efficiently. The authors also dismiss the findings of Pradhan et al. without mentioning them by name. Verbatim: “By the way, some researchers previously supposed the SARS CoV-2 was artificial due to four inserts in the S protein of SARS CoV-2 from HIV sequence. However, the results of protein sequence alignment revealed that the similar sequence of the reported fourth insertion site (680-SPRR-683) in SARS CoV-2 was commonly found in many beta-coronavirus. Therefore, we supposed that based on the current evidence, it is not scientific to consider the insertion sequence in SARS CoV-2 S protein being artificial.”****************Another paper considered SARS CoV-2 and concluded that it arose naturally. The proximal origin of SARS-CoV-2Kristian G. Andersen, Andrew Rambaut?, W. Ian Lipkin, Edward C. Holmes? and Robert F. Garry authors assert that “Our analyses clearly show that SARS-CoV-2 is not a laboratory construct or a purposefully manipulated virus.” They say that its genome has two notable features, which are (i) that it appears to be optimized for binding to the human ACE2 receptor and (ii) that the spike protein has a functional polybasic (furin) cleavage site at the S1-S2 boundary through the insertion of 12 nucleotides [that would be the nucleotides coding for SPRR in Pradhan’s insert 4] which additionally led to the predicted acquisition of three O-linked glycans around the site. Regarding point (1), the authors state that the RBD of the spike protein is the most variable part of the coronavirus genome. Six RBD amino acids have been shown to be critical for binding to ACE2 receptors and determining the host range for SARS CoV-like viruses. Five of the six amino acids differ between SARS CoV-2 and SARS CoV. Based on structural studies and biochemical experiments, the SARS CoV-2 seems to have an RBD that binds with high affinity to the ACE2 receptor of humans, ferrets, cats and other species with high receptor homology. While such analyses show high affinity of SARS CoV-2, computational analyses predict that the interaction is not ideal. Therefore they think this is evidence that the high affinity of SARS CoV-2 for ACE2 receptors is the result of natural selection that permitted another optimal binding solution and not the product of purposeful manipulation. Regarding point (2), they say that the polybasic cleavage site (RRAR) at the junction of subunits S1 and S2, which allows effective cleavage by furin and other proteases, has a role in determining viral infectivity and host range. [Note: “RRA” are the last three amino acids of Pradhan et al.’s insert 4; the eight amino acids of this insert occupy positions 677 to 684 of the amino acid chain of SARS CoV-2’s spike protein, and the final R in that polybasic cleavage site RRAR is located at position 689, the first amino acid beyond the insert. R stands for the amino acid arginine, A for alanine.]Regarding point (2), they also note that the proline (P) in the insert in question [the last 4 amino acids of the insert are PRRA, in positions 685-688] creates a turn that is predicted to result in the addition of O-linked glycans to S673, T678 and S686, which flank the cleavage site and are unique to SARS CoV-2. Polybasic cleavage sites have not been observed in related ‘lineage B’ betacoronaviruses although other human betacoronviruses (including HKU1 from lineage A) have those sites and predicted O-linked glycan. The authors say that given the genetic variation in the spike, it is likely that SARS CoV-2-like viruses will be discovered in other species. Experiments with SARS CoV (the original SARS of 2002-03) have shown that the insertion of a furin cleavage site at the S1-S2 junction enhances cell-cell fusion without affecting viral entry. The acquisition of such a site in the hemagglutinin protein (HA) of avian influenza converts low pathogenicity avian influenza viruses to highly pathogenic forms. The authors say the function of the predicted O-lined glycans is unclear, but they could create a “mucin-like domain’ that shields epitopes or key residues on the SARS CoV-2 spike protein. Several viruses uses mucin-like domains as glycan shields for immunoevasion (i.e., hiding from the host’s immune system). Although prediction of O-linked glycosylation is robust (i.e., all the models predict it), they authors say more experimental studies are needed to determine if these sites are used in SARS CoV-2. The authors think it unlikely that SARS CoV-2 emerged through manipulation of a SARS CoV-like coronavirus. It is optimized for ACE2 binding with an efficient solution different from those predicted. Also, if it were manipulated, one of several reverse-genetic systems available for betacoronaviruses would probably have been used. They propose that it either rose by natural selection in an animal before zoonotic transfer, or that it arose in humans following zoonotic transfer. On the possibility of it arising it animals before zoonotic transfer (before humans got it), they suggest it arose in bats. There is a sequence called RaTG13, sampled from the bat Rhinolophus affinis that is about 96% identical to SARS CoV-2 but the viruses differ in the receptor binding domain, suggesting it would not bind efficiently to human ACE2. The overall genome of the Malayan pangolin (Manis javanica) that has been illegally imported into Guangdong province, is not as close to SARS CoV-2 as RaTG13, but some pangolin coronaviruses exhibit strong similarity to SARS CoV-2 in the RBD, including all six key RBD residues. The authors say that the similarity of the pangolin RBD and the SARS CoV-2 RBD clearly shows that the spike protein optimized for binding to human-like ACE2 is the result of natural selection. They say that neither the bat nor pangolin betacoronaviruses have polybasic cleavage sites and there is no known animal coronavirus sufficiently similar to SARS CoV-2 to have served as a direct progenitor, but that the coronaviruses in bats and other species are massively undersampled. For a precursor virus to have acquired both the polybasic cleavage site and mutations in the spike protein suitable for binding to human ACE2, an animal host would have to have a high population density and an ACE-encoding gene similar to that in humans. On the possibility of the virus evolving by natural selection in humans following zoonotic transfer, they say it’s possible that a progenitor virus jumped into humans acquiring the genomic features described through adaptation and undetected human-to-human transmission. Once acquired, these adaptations would enable the pandemic to take off. Since all SARS CoV-2 genomes sequenced to date have the genomic features described above, they are derived from a common ancestor. The presence in pangolins of an RBD that is very similar to that of SARS CoV-2 means that we can infer it was probably in the virus that jumped to humans. But they note that this does not address the insertion of the polybasic cleavage site, which they claim must have occurred during human to human transmission. Since the evidence points to the emergence of the virus in late November to early December of 2019, this scenario presumes a period of unrecognized transmission in humans between the initial zoonotic event and the acquisition of the polybasic cleavage site. There would have been sufficient opportunity for this to happen if there had been many prior zoonotic events that produced short chains of human-to-human transmission over an extended period. This is what happened with MERS CoV, for which all human cases resulted from repeated jumps of the virus from dromedary camels, producing single infections or short transmission chains that eventually resolve, with no adaptation to sustained transmission. The authors then examine the possibility of an escape from a lab, which they acknowledge has happened. They say that in theory, the SARS CoV-2 virus could have acquired the RBD mutations during adaptation to passage in cell culture, as observed in studies with SARS CoV. But the finding that the SARS CoV-like coronavirus from pangolins has a nearly identical RBD is evidence that the mutations were acquired by recombination or mutation (rather than engineered in a lab). They say that the acquisition of both the polybasic cleavage site and the predicted O-linked glycans argues against a culture-based scenario. New polybasic cleavage sites have been observed only after prolonged passage of low-pathogenicity avian influenza in vitro or in vivo.Since all notable SARS Cov-2 features, including the RBD and polybasic cleavage site, are found in nature, they do not believe that any type of laboratory-based scenario is possible. A future observation of an intermediate or fully formed polybasic cleavage site in a SARS CoV-2-like corona virus in animals would lend further support to a natural selection hypothesis. [It is worth noting that when the authors speak about the virus arising in a laboratory, they consider only the possibility of it mutating on its own during the period of time it is being cultured. They do not even address the possibility of deliberately taking some genetic material – for example for the code for the RBD from the pangolin – and inserting it into the SARS CoV-2.] ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Related searches