Implementation of Kmer Based Bioinformatics Infrastructure to Capture Microbial Genus and Species Level Detection and Characterization Biomarkers from Metagenomic Datasets
Abstract
The current multi-pathogen detection method used by the CDC to investigate Unexplained Respiratory Disease Outbreaks (URDO) is based on the basic level of detection (presence vs absence) using a multipathogen array of... [ view full abstract ]
The current multi-pathogen detection method used by the CDC to investigate Unexplained Respiratory Disease Outbreaks (URDO) is based on the basic level of detection (presence vs absence) using a multipathogen array of real-time PCR assays. This approach is being transformed to take advantage of the next generation sequencing Illumina MiSeq platform to capture the genetic diversity that allows detection and characterization of known, and potentially novel or rare, pathogens. Data analysis continues to be the bottleneck for “-omics” related studies. The majority of existing metagenomics analysis tools only report the relative abundance of a detected organism and most do not provide information that allows for the recovery of genetic variation from the sequenced reads.
We present a kmer based approach for recovery of genetic detection and characterization biomarkers from metagenomics and whole genome sequencing datasets. Using this tool with in-house mock samples and clinical specimens we were able to detect genus and species level characterizations. This is an improvement on our previously designed metagenomics analysis workflow. To date we have detected antibiotic resistance markers in Mycoplasma pneumoniae whole genome sequencing datasets, identified species and serogroup information for Legionella spp., and identified respiratory disease etiologic agents such as Chlamydia pneumoniae, Haemophilus influenzae, and Influenza A and B. The majority of kmer based approaches are limited by the quality of the reference database used in the comparison analysis. We have overcome this drawback by including genetic variation in our reference kmer database through incorporation of multiple genome sequences representing diversity within targeted regions. Also, we plan to add an additional machine-learning prediction component to determine genetic relatedness relationships when the threshold for reference detection is not met. This useful information may guide laboratorians on which traditional assays to perform to verify prediction results when clinical specimens are scare. Because of the increase in antimicrobial resistant microbes, we plan to incorporate an antimicrobial resistance sequence panel to highlight the potential to rapidly determine antibiotic resistance characteristics.
This approach will analyze next generation sequencing data from clinical specimens collected during URDO investigations and assist researchers in the development of more rapid and targeted assays to unveil phenotypic or genotypic characteristics for potentially novel and rare pathogens. Also its modular infrastructure allows for scalability with the inclusion of additional genomic panels such as the microbial virulome. Ultimately, this approach will improve URDO investigations by decreasing outbreak response time and informing evidence-based interventions.
The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
Authors
-
Shatavia Morrison
(Centers for Disease Control and Prevention)
-
Devika Singh
(Centers for Disease Control and Prevention)
-
Maureen Diaz
(Centers for Disease Control and Prevention)
-
Alvaro Benitez
(Centers for Disease Control and Prevention)
-
Bernard Wolff
(Centers for Disease Control and Prevention)
-
Jonas Winchell
(Centers for Disease Control and Prevention)
Topic Areas
De novo sequencing, re-sequencing, Human seq., RNA seq., metagenomics, etc. , Sequencing applications for metagenomics, transcriptomics, diagnostics, and biosurveillanc , Analysis for metagenomics, antimicrobial resistance, and forensics
Session
PS-1 » Poster Session A (19:00 - Tuesday, 16th May, Mezannine & New Mexico Room)
Presentation Files
The presenter has not uploaded any presentation files.