The Shortest Distance Between Two Sequences?
Abstract
Distance metrics(similarity functions) are functions used to classify sequences and to build phylogenetic trees. Over 10 different distance metrics have been used for comparing sequences, such as various k-mer distances,... [ view full abstract ]
Distance metrics(similarity functions) are functions used to classify sequences and to build phylogenetic trees. Over 10 different distance metrics have been used for comparing sequences, such as various k-mer distances, Earth Mover's Distance on de Bruijn graphs, and variants on edit distance. These metrics vary in their mathematical and biological relevance as well as their computational complexity. They also vary in how well they handle the “-” character indicating a gap of indeterminate length. Changing functions can dramatically affect results,affecting both operational taxonomic unit (OTU) identification and community studies. Misclassification in biomedical cases could lead to using the wrong mitigation strategy. In community analysis, it might lead to incorrect conclusions about the community makeup. We use several metric functions to compare sequences within the RDP database and look at how they affect classification within the database as well as how an unknown sequence would be identified. Understanding the strengths and weaknesses of the different metrics can assist researchers in choosing the best one for their investigations.
Authors
-
Kenneth Ingham
(Kenneth Ingham Consulting, LLC)
-
Ara Winter
(University of New Mexico)
Topic Areas
Comparative genomics, re-sequencing, SNPs, structural variation , Analysis for metagenomics, antimicrobial resistance, and forensics
Session
OS-5 » Metagenomics, Informatics, Assembly & Analysis (14:00 - Wednesday, 17th May, La Fonda Ballroom)
Presentation Files
The presenter has not uploaded any presentation files.