Traditionally, species phylogenies have been acquired by comparisons of a specific gene, i.e., 16S rRNA. However, they are rarely consistent with each other, due to horizontal gene transfer, unrecognized paralogy, and highly variable rates of evolution. Snel et al. (1999) have developed a creative distance-based phylogeny constructed on the basis of gene content of 13 completely sequence genomes. The evolutionary distance between two genomes is defined as (1 - similarity), and the similarity is the fraction of the number of their common genes to the number of genes in smallest genome. The common genes between two genomes are considered only when the score of two genes is above cutoff value (say, E=0.01) according to Smith-Waterman comparison.
We argue that common gene clusters predicted by our method can produce accurate phylogenetic relationship among different organisms. Note that our method do not have to align sequences using the pairwise and multiple sequence alignment methods. We collected 13 genomes used in Snel et al. (1999): H.influenzae, M.genitalium, Synechocystis, M.jannaschii, E.coli, M.thermoautotrophicum, H.pylori, A.fulgidus, B.subtilis, B.burgdorferi, S.cerevisiae, A.aeolicus, and HP.horikoshii (Figure 1). The evolutionary distance between two genomes is defined in the same way as in Snel et al. (1999). The only difference between our approach and the one by Snel et al. is how to count common genes between genomes. Common genes in our approach is those in predicted gene clusters.
Figure 1 compares three phylogenetic trees generated using 16S rRNA, common genes, and common gene clusters. Plot (a) and (b) come from Snel et al. (1999), and plot (c) is constructed using the neighborhood joining method in phylip package and visualized using PhyloDRAW. It is interesting that all trees are the same except Synechocystis and S.cerevisiae. As shown in the figure, predicted gene clusters can be used to produce an accurate phylogenetic tree without aligning sequences.
|