gms | German Medical Science

International Conference on SARS - one year after the (first) outbreak

08. - 11.05.2004, Lübeck

The evolutionary rate of SARS-CoV


  • corresponding author presenting/speaker Massimo Cicozzi - Istituto Superiore di Sanitá, Rome, Italy
  • Marco Salemi - Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, U.S.A.
  • Maria Jose Ruiz-Alvarez - Istituto Superiore di Sanitá, Rome, Italy
  • Walter M. Fitch - Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, U.S.A.
  • Giovanni Rezza - Istituto Superiore di Sanitá, Rome, Italy

International Conference on SARS - one year after the (first) outbreak. Lübeck, 08.-11.05.2004. Düsseldorf, Köln: German Medical Science; 2004. Doc04sarsP4.01

Die elektronische Version dieses Artikels ist vollständig und ist verfügbar unter:

Veröffentlicht: 26. Mai 2004

© 2004 Cicozzi et al.
Dieser Artikel ist ein Open Access-Artikel und steht unter den Creative Commons Lizenzbedingungen ( Er darf vervielfältigt, verbreitet und öffentlich zugänglich gemacht werden, vorausgesetzt dass Autor und Quelle genannt werden.



Estimating the rate of evolution of SARS-CoV would give an indication of how quickly the virus can potentially increase its genetic variability, which in turn, has important implications for disease progression, drugs and vaccine development. Phylogenetic analysis of SARS-CoV sequences revealed a high degree of homogeneity, which could indicate an unusually slow-evolving RNA virus. To investigate further, we carried out a full genome alignment of the available SARS-CoV strains using the Clustal algorithm [1]. The alignment was carefully edited by hand to maximize the number of identities, and the site positions containing gaps were removed. The resulting alignment is 21333 nucleotides long: 63 sites have at least one sequence with a different nucleotide, and only 10 sites are phylogenetically informative, i.e. they are useful to discriminate among different tree topologies, according to the unweighted parsimony criterion. Subalignments were generated for all the known coding regions, most of which were identical among the different isolates. We analyzed ORF 1ab [2], which appears to be the most variable. Maximum likelihood (ML) methods were employed for the analyses because they allow for the testing of different phylogenetic hypotheses by calculating the probability that a given model of evolution generated the observed data and by comparing the probabilities of nested models with the likelihood ratio test [3].

Table 1 [Tab. 1] shows the average base composition and the ML estimates of parameters describing the mode of evolution of SARS-CoV in ORF 1ab. The a parameter of the G-distribution is extremely low (0.008), implying an extensive heterogeneity in the rate at which different nucleotide sites mutate along the genome. Moreover, the ML estimator implies about 90% of the constant sites in the sequences are indeed invariable, i.e. they never change, possibly because of strong purifying selection. The variable sites, on the other hand, accumulate mutations very quickly. However, a note of caution is necessary because such result may also be due to the small number of sequences available for the analysis and the very short observation period. Table 1 [Tab. 1] also shows that the hypothesis of a molecular clock cannot be rejected, although the p-value is very close to 0.05, i.e. SARS-CoV isolates appear to be evolving at a constant evolutionary rate. Assuming that the SARS-CoV cenancestor entered the human population from 4 to 8 months ago [4], the evolutionary rate of the virus is of the order of 4x10-4 nucleotide changes per site per year (95% C.I.: 2.0 10-4 - 6 10-4) along the entire ORF 1ab. When only the variable sites are considered, the estimated rate is noticeably faster: 3.5x10-3 changes per site per year (95% C.I.: 2.6 10-3 - 4.4 10-3). This is the usual range for an RNA virus. Therefore, on average 8 point mutations are expected along the entire ORF 1ab region at each replication round. However, we cannot exclude the possibility that the sequence variability in the data sets is also affected by the passage of the virus in Vero cell culture before sequencing [2].

In conclusion, the low sequence variability of SARS-CoV isolates is probably the consequence of its recent emergence in humans, but much greater viral heterogeneity with unpredictable consequences could be expected if the epidemic is not controlled. A rigorous phylogenetic approach could be an important tool to monitor the future evolution of the virus [Tab. 1].


Thompson JD, Higgins DG and Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 1994; 22: 4673-4680
Ruan Y, Wei CL, Ee AL, Vega VB, Thoreau H, Su ST, Chia JM, Ng P, Chiu KP, Lim L, Zhang T, Peng CK, Lin EO, Lee NM, Yee SL, Ng LF, Chee RE, Stanton LW, Long PM, Liu ET. Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection. Lancet 2003 May 24; 361( 9371): 1779-85
Swofford DL and Sullivan J. Phylogenetic inference based on parsimony and other metrhods with PAUP*. In The Phylogenetic Handbook - a practical approach to DNA and protein phylogeny. Salemi M, Vandamme A-M (eds) 2003. Cambridge University Press, New York, NY, USA
WHO. Cumulative of reported probable cases of severe acute respiratory syndrome(SARS).