Analysis of Highly Multiplexed Amplicon Sequencing Data: a Proof of Concept Using Antimicrobial Resistance Targets in Salmonella and Human Stool
Abstract
Isolate whole genome sequencing (WGS) is a powerful tool for foodborne bacterial enteric disease and antimicrobial resistance (AMR) surveillance. The declining availability of pathogen isolates, resulting from the adoption of... [ view full abstract ]
Isolate whole genome sequencing (WGS) is a powerful tool for foodborne bacterial enteric disease and antimicrobial resistance (AMR) surveillance. The declining availability of pathogen isolates, resulting from the adoption of culture-independent diagnostic tests, threatens the integrity of surveillance systems and makes the development of direct-from-specimen subtyping methods critically important. Highly multiplexed amplicon sequencing (HMAS) is a cost-effective and scalable method that may achieve a resolution similar to that of isolate WGS, but best practices for the analysis of HMAS data are still being developed. We tested a pipeline using existing software tools to detect target AMR genes present in an HMAS data set generated using Salmonella strains and a human stool spiked with Salmonella.
180-240 bp amplicons were generated using 749 primer pairs targeting 111 AMR genes using the Juno Targeted DNA Sequencing Library Preparation System (Fluidigm Inc.) and sequenced on an Illumina MiSeq using 2x250 bp v2 chemistry. Nine Salmonella strains with known resistance phenotypes and genotypes, one pan-susceptible Salmonella strain, one human stool sample from a healthy donor, and the same stool spiked with one resistant Salmonella strain were sequenced at four concentrations each in addition to a no-template water control. Mothur v.1.39.2 was used to perform sample demultiplexing, read assembly, and quality filtering followed by BLAST against a custom AMR reference database. BLAST results were filtered using in-house Python scripts.
We detected 32 AMR genes present in the strain DNAs that were targeted by the panel. The only gene with primers in the panel that was not detected by mothur-BLAST was tetA, which was present in 4/10 strains tested. An additional 12 AMR genes present in strain DNAs that were closely related to variants covered by the panel were also detected. Many additional panel targets not known to be present in the DNAs tested were found across all samples, including the water control, but at abundances generally two-logs lower than true positive targets. All targets detected in the water control and the two strain DNAs without genes targeted by the panel had abundances of less than ten paired-end reads. The AMR genes detected in the spiked stool DNA were consistent with those detected in the unspiked stool and strain DNAs individually.
Our mothur-BLAST approach successfully identified signal from targets present in HMAS data for all samples tested with a clear difference in signal strength between true positive and false positive targets. This method can be improved by optimizing or replacing the BLAST step to increase the analysis speed. Additionally, our Python scripts currently remove amplicons with low BLAST E-values if the length of the amplicon deviates from that predicted by the reference sequence. This strict filter is likely discarding true positive amplicons; further work is needed to find a more suitable threshold. When refined, this pipeline will provide a simple, effective, and open-sourced method for the analysis of HMAS data sets that can be adapted to any primer panel by changing the reference database.
Authors
-
Jo Williams-Newkirk
(IHRC, Inc.)
-
Jasmine Hensley
(Oak Ridge Institute for Science Education)
-
Jessica Chen
(IHRC, Inc.)
-
Katie Dillon
(Oak Ridge Institute for Science Education)
-
Milan Patel
(Oak Ridge Institute for Science Education)
-
Andrew Huang
(Centers for Disease Control and Prevention, Enteric Diseases Laboratory Branch)
-
John Besser
(Centers for Disease Control and Prevention, Enteric Diseases Laboratory Branch)
-
Eija Trees
(Centers for Disease Control and Prevention)
-
Heather Carleton
(Centers for Disease Control and Prevention)
Topic Areas
Sequencing strategies and technology advancements using the various NGS platforms , Comparative genomics, re-sequencing, SNPs, structural variation , Analysis for metagenomics, antimicrobial resistance, and forensics
Session
PS-2 » Poster Session B (20:00 - Tuesday, 16th May, Mezannine & New Mexico Room)
Presentation Files
The presenter has not uploaded any presentation files.