Codon-Based Phylogeny Using Whole Genome Data
Abstract
A prerequisite of comparative genomics is an established set of evolutionary relationships, phylogenetic trees and ortholog groups. Coding sequences are well-preserved in evolution and easy to identify, making them a highly... [ view full abstract ]
A prerequisite of comparative genomics is an established set of evolutionary relationships, phylogenetic trees and ortholog groups. Coding sequences are well-preserved in evolution and easy to identify, making them a highly suitable subset of genome data for these tasks. We developed a tool with performance sufficient to analyze thousands of ortholog sets across many genomes and that uses the full information present in codons, not just amino acid sequence, but without the simplification of only selecting first, second and third coding positions. A guiding task was to have a method that is resistant to composition evolution, which severely affects maximum likelihood methods, which have not developed efficient calculations for time asymmetric evolution. Our solution was to extend the concept of LogDet distance metrics to comparisons involving multiple codons. Because codon evolution includes negative selection to preserve amino acids, we expected that there would be groups evolving at very different rates, and this was confirmed by the data analysis, where we identified approximately 45-fold difference between substitutions involving tryptophan and the fastest evolving silent transitions. Due to this large difference, different combinations of codons provide the most accurate information about evolutionary distance between pairs of genomes. For short distances, abundant synonymous substitutions are the most informative, while for the longer distances, such substitutions saturate and conserved amino acid types provide more information.
We ran a number of tests with mRNA sets for various eukaryotes. For a group of 21 butterflies, our distance metric is almost perfectly additive, with a relative rms error of only 0.5%. This means that even very short, deep branches can be resolved. This whole genome method of evolutionary reconstruction reaches the limit where well-resolved but short branches correspond to the evolution of all genes and may not necessarily correspond to the evolution of particular phenotypic characters that are encoded by a small group of genes that may have been horizontally transferred and positively selected. In the butterfly field, the visible and mating-related phenotypes are likely to be in this group, generating controversies whose resolution requires an updated understanding of genome evolution in species where horizontal transfer is possible even if the genomes differ at as much as 10% of matching positions.
Authors
-
Raquel Bromberg
(University of Texas Southwestern)
-
Qian Cong
(University of Texas Southwestern)
-
Nick Grishin
(University of Texas Southwestern)
-
Dominika Borek
(University of Texas Southwestern)
-
Zbyszek Otwinowski
(University of Texas Southwestern)
Topic Area
Comparative genomics, re-sequencing, SNPs, structural variation
Session
PS-2 » Poster Session B (20:00 - Tuesday, 16th May, Mezannine & New Mexico Room)
Presentation Files
The presenter has not uploaded any presentation files.