Application of MIDAS to diarrheal stool metagenomes for the subtyping of Salmonella enterica
Abstract
IntroductionCulture-independent diagnostics have emerged as a low-cost and fast solution for point-of-care testing from a specimen, meaning cultures may no longer be available for public health surveillance. For foodborne... [ view full abstract ]
Introduction
Culture-independent diagnostics have emerged as a low-cost and fast solution for point-of-care testing from a specimen, meaning cultures may no longer be available for public health surveillance. For foodborne bacterial disease surveillance, disease-state stools may soon be the only samples available. New high resolution techniques for subtyping and characterizing foodborne pathogens directly from stool are needed to replace whole-genome sequencing of isolates. One approach to do this is shotgun metagenomics of stool DNA. Multiple bioinformatics tools tout strain-level resolution from shotgun metagenomic data via read binning or using a set of reference genomes. We tested the applicability of the Metagenomic Intra-Species Diversity Analysis System (MIDAS), a tool for strain-level metagenomic classification and single nucleotide variant (SNV) characterization, to foodborne bacterial disease surveillance. MIDAS was chosen for its SNV calling pipeline, ability to use custom genome databases, and quick running time.
Methods
MIDAS was assessed using shotgun metagenomic data from disease-state stools and mock community data produced in silico. Stool samples from fifteen patients in two similar Salmonella enterica outbreaks in 2013 were analyzed using the MIDAS pipeline, and those results were compared with results obtained from matching isolate sequence data. In silico mock community data sets, produced by selecting representatives from a manually curated list of fifteen species described as being found in stool in at least one peer-reviewed publication, were analyzed with the MIDAS pipeline. MIDAS was run using both the default MIDAS database with three S. enterica reference genomes and with a broader set of 35 S. enterica reference genomes added to the database. SNVs were detected using 5x, 10x, 15x, or 20x coverage cutoffs and were then used to generate trees.
Results
Thirteen of fifteen disease-state stools had sufficient coverage for the MIDAS SNV calling pipeline (ranging from 6.6x to 250x) with one S. enterica reference using the default database. Each tree correctly separated samples from the two similar outbreaks with high support values (>0.99). When a broader set of 35 S. enterica reference genomes was added to the reference database, S. enterica read assignments were split such that the most abundant reference had 60% lower coverage than with the default database, preventing SNV calling in some samples. In a simple mock community of four bacterial species, MIDAS was able to call all species present, including S. enterica at as low as 0.1% relative abundance. All false positive species predictions were in the same genus as a species which was actually present in the mock community. All predicted proportions of species were within 30% of their actual proportion in the sample.
Conclusion
Published tools for strain characterization from metagenomics need to be specifically assessed for the unique circumstances of enteric pathogen surveillance. MIDAS was able to use metagenomic data to cluster outbreak samples in a way that was consistent with isolate and epidemiological data. MIDAS also correctly identified the components in a mock community. Early results using MIDAS are promising, but further testing is necessary, particularly with pathogens that are more closely related to commensal bacteria.
Authors
-
Julie Shay
(Association of Public Health Laboratories)
-
Heather Carleton
(Centers for Disease Control and Prevention, Enteric Diseases Laboratory Branch)
-
Andrew Huang
(Centers for Disease Control and Prevention, Enteric Diseases Laboratory Branch)
Topic Areas
Sequencing applications for metagenomics, transcriptomics, diagnostics, and biosurveillanc , Analysis for metagenomics, antimicrobial resistance, and forensics , Human, non-human, and infectious disease applications
Session
PS-2 » Poster Session B (20:00 - Tuesday, 16th May, Mezannine & New Mexico Room)
Presentation Files
The presenter has not uploaded any presentation files.