In our second stage, we wanted to construct non-recombinant regions where our approach to breakpoint identification was as conservative as possible. P.L. COVID-19: A Catastrophe or Opportunity for Pangolin Conservation? - Nature Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. In this approach, we considered a breakpoint as supported only if it had three types of statistical support: from (1) mosaic signals identified by 3SEQ, (2) PI signals identified by building trees around 3SEQs breakpoints and (3) the GARD algorithm35, which identifies breakpoints by identifying PI signals across proposed breakpoints. Syst. Individual sequences such as RpShaanxi2011, Guangxi GX2013 and two sequences from Zhejiang Province (CoVZXC21/CoVZC45), as previously shown22,25, have strong phylogenetic recombination signals because they fall on different evolutionary lineages (with bootstrap support >80%) depending on what region of the genome is being examined. 25, 3548 (2017). PI signals were identified (with bootstrap support >80%) for seven of these eight breakpoints: positions 1,684, 3,046, 9,237, 11,885, 21,753, 22,773 and 24,628. Are you sure you want to create this branch? & Bedford, T. MERS-CoV spillover at the camelhuman interface. The estimated divergence times for the pangolin virus most closely related to the SARS-CoV-2/RaTG13 lineage range from 1851 (17301958) to 1877 (17461986), indicating that these pangolin lineages were acquired from bat viruses divergent to those that gave rise to SARS-CoV-2. Another similarity between SARS-CoV and SARS-CoV-2 is their divergence time (4070years ago) from currently known extant bat virus lineages (Fig. Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. Li, Q. et al. Given what was known about the origins of SARS, as well as identification of SARS-like viruses circulating in bats that had binding sites adapted to human receptors29,30,31, appropriate measures should have been in place for immediate control of outbreaks of novel coronaviruses. PDF single centre retrospective study Evol. 4 TMRCAs for SARS-CoV and SARS-CoV-2. stand-alone pangolin work flows or Illumina DRAGEN COVID Lineage App (v3.5.5) following the default parameters. While there is evidence of positive selection in the sarbecovirus lineage leading to RaTG13/SARS-CoV-2 (ref. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. CAS To avoid artefacts due to recombination, we focused on NRR1 and NRR2 and the recombination-masked alignment NRA3 to infer time-measured evolutionary histories. Humans' selfish, speciesist treatment of these animals could be the very reason why the novel coronavirus exists. We find that the sarbecovirusesthe viral subgenus containing SARS-CoV and SARS-CoV-2undergo frequent recombination and exhibit spatially structured genetic diversity on a regional scale in China. CAS performed recombination and phylogenetic analysis and annotated virus names with geographical and sampling dates. SARS-CoV-2 itself is not a recombinant of any sarbecoviruses detected to date, and its receptor-binding motif, important for specificity to human ACE2 receptors, appears to be an ancestral trait shared with bat viruses and not one acquired recently via recombination. Sequences are colour-coded by province according to the map. S. China corresponds to Guangxi, Yunnan, Guizhou and Guangdong provinces. BEAST inferences made use of the BEAGLE v.3 library68 for efficient likelihood computations. T.T.-Y.L. Mol. When the genomic data included both coding and non-coding regions we used a single GTR+ substitution model; for concatenated coding genes we partitioned the alignment by codon position and specified an independent GTR+ model for each partition with a separate gamma model to accommodate inter-site rate variation. The Bat, the Pangolin and the City: A Tale of COVID-19 Li, X. et al. Using both prior distributions, this results in six highly similar posterior rate estimates for NRR1, NRR2 and NRA3, centred around 0.00055 substitutions per siteyr1. Virus Evol. Because coronaviruses are known to be highly recombinant, we used three different approaches to identify non-recombinant regions for use in our Bayesian time-calibrated phylogenetic inference. Early detection via genomics was not possible during Southeast Asias initial outbreaks of avian influenza H5N1 (1997 and 20032004) or the first SARS outbreak (20022003). PubMed c, Maximum likelihood phylogenetic trees rooted on a 2007 virus sampled in Kenya (BtKy72; root truncated from images), shown for five BFRs of the sarbecovirus alignment. 1c). T.L. Root-to-tip divergence as a function of sampling time for non-recombinant regions NRR1 and NRR2 and recombination-masked alignment set NRA3. 4), but also by markedly different evolutionary rates. Preprint at https://doi.org/10.1101/2020.02.10.942748 (2020). This leaves the insertion of polybasic. PubMedGoogle Scholar. Adv. Suchard, M. A. et al. To obtain https://doi.org/10.1093/molbev/msaa163 (2020). PubMed Central Add entries for pangolin-data/-assignment 1.18.1.1 (, Really add a document on testing strategy. Pango lineage designation and assignment using SARS-CoV-2 - PubMed The coronavirus genome that these researchers had assembled, from pangolin lung-tissue samples, contained some gene regions that were ninety-nine per cent similar to equivalent parts of the SARS . As of December 2, 2021, SJdRP, a medium-sized city in the Northwest region of So Paulo state, Brazil (Fig. Mol. 91, 10581062 (2010). Proc. COVID-19: Time to exonerate the pangolin from the transmission of SARS Is the COVID-19 Outbreak the 'Revenge of the Pangolin'? | PETA 36) (RDP, GENECONV, MaxChi, Bootscan, SisScan and 3SEQ) and considered recombination signals detected by more than two methods for breakpoint identification. Probable Pangolin Origin of SARS-CoV-2 Associated with the COVID-19 and P.L.) 1c). Sequence similarity. 1, vev016 (2015). PLoS ONE 5, e10434 (2010). Trova, S. et al. Extensive diversity of coronaviruses in bats from China. 84, 31343146 (2010). 04:20. Transparent bands of interquartile range width and with the same colours are superimposed to highlight the overlap between estimates. Med. Annu Rev. In light of these time-dependent evolutionary rate dynamics, a slower rate is appropriate for calibration of the sarbecovirus evolutionary history. Evol. The plots are based on maximum likelihood tree reconstructions with a root position that maximises the residual mean squared for the regression of root-to-tip divergence and sampling time. Chernomor, O. et al. And this genotype pattern led to creating a new Pangolin lineage named B.1.640.2, a phylogenetic sister group to the old B.1.640 lineage renamed B.1.640.1. Mol. 110. The consistency of the posterior rates for the different prior means also implies that the data do contribute to the evolutionary rate estimate, despite the fact that a temporal signal was visually not apparent (Extended Data Fig. Schierup, M. H. & Hein, J. Recombination and the molecular clock. Coronavirus Software Tools - Illumina, Inc. The assumption of long-term purifying selection would imply that coronaviruses are in endemic equilibrium with their natural host species, horseshoe bats, to which they are presumably well adapted. is funded by The National Natural Science Foundation of China Excellent Young Scientists Fund (Hong Kong and Macau; no. Uncertainty measures are shown in Extended Data Fig. 3). Wu, Y. et al. 874850). Patino-Galindo, J. Coronavirus: Pangolins found to carry related strains. Nat. Visual exploration using TempEst39 indicates that there is no evidence for temporal signal in these datasets (Extended Data Fig. TMRCA estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent for the different data sets and different rate priors in our analyses. 4). PubMed Central You signed in with another tab or window. Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. Due to the absence of temporal signal in the sarbecovirus datasets, we used informative prior distributions on the evolutionary rate to estimate divergence dates. Nature 583, 286289 (2020). Sequencing from Malayan pangolins collected during anti-smuggling operations in southern China detected coronavirus lineages related to SARS-CoV-2. Nat Microbiol 5, 14081417 (2020). 5 Comparisons of GC content across taxa. Using these breakpoints, the longest putative non-recombining segment (nt1,88521,753) is 9.9kb long, and we call this region NRR2. Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins 3). SARS-CoV-2 Variant Classifications and Definitions Coronavirus: Pangolins may have spread the disease to humans We use three bioinformatic approaches to remove the effects of recombination, and we combine these approaches to identify putative non-recombinant regions that can be used for reliable phylogenetic reconstruction and dating. This study provides an integration of existing classifications and describes evolutionary trends of the SARS-CoV . 2, bottom) show that SARS-CoV-2 is unlikely to have acquired the variable loop from an ancestor of Pangolin-2019 because these two sequences are approximately 1015% divergent throughout the entire Sprotein (excluding the N-terminal domain). Instead, similarity in codon usage metrics between the SARS-CoV-2 and eukaryotes analyzed was correlated with coding sequence GC content of the eukaryote, with more similar codon usage being identified in eukaryotes with low GC content similar to that of the coronavirus (b). The Artic Network receives funding from the Wellcome Trust through project no. The sizes of the black internal node circles are proportional to the posterior node support. 4), that region and shorter BFRs were not included in combined putative non-recombinant regions. M.F.B., P.L. 92, 433440 (2020). The key to successful surveillance is knowing which viruses to look for and prioritizing those that can readily infect humans47. Curr. Global epidemiology of bat coronaviruses. The shaded region corresponds to the Sprotein. matics program called Pangolin was developed. Frontiers | Novel Highly Divergent SARS-CoV-2 Lineage With the Spike These residues are also in the Pangolin Guangdong 2019 sequence. Divergence time estimates based on the three regions/alignments where the effects of recombination have been removed. Current sampling of pangolins does not implicate them as an intermediate host. In outbreaks of zoonotic pathogens, identification of the infection source is crucial because this may allow health authorities to separate human populations from the wildlife or domestic animal reservoirs posing the zoonotic risk9,10.