Since 2013, PulseNet, the national surveillance network for foodborne illness, has been transitioning from pulsed-field gel electrophoresis to whole genome sequencing (WGS) as the new method for foodborne surveillance. It is... [ view full abstract ]
Since 2013, PulseNet, the national surveillance network for foodborne illness, has been transitioning from pulsed-field gel electrophoresis to whole genome sequencing (WGS) as the new method for foodborne surveillance. It is expected that the implementation of this method will lead to earlier detection and resolution of outbreaks. This comparative study evaluated the data generated by two desktop sequencing platforms that could be used by PulseNet public health laboratories: the Ion Torrent Personal Genome Machine (PGM) and the Illumina MiSeq.
A total of 100 strains of Salmonella enterica, Escherichia coli, Listeria monocytogenes, and Campylobacter spp. isolates, covering 11 Campylobacter species and 3-9 serotypes/serogroups of the remaining three species, were sequenced on both platforms. Libraries for the PGM were prepared with the KAPA fragmentation and Library Preparation Kit for 400 base pair (bp) reads and sequenced using the Hi-Q View kit and Ion 316 chips (single-end). The Nextera XT libraries were sequenced on the MiSeq instrument using 2x250 bp chemistry (paired-end).
Ion Torrent PGM and Illumina MiSeq raw reads were checked for quality and cleaned using CG-Pipeline (github.com/lskatz/CG-Pipeline). The parameters were set with either more stringent quality trimming for raw data that had min average quality scores =Q30. Sequence identity was verified with Kraken (github.com/DerrickWood/kraken), a taxonomic sequence classification software, and SeqSero (github.com/denglab/SeqSero), a Salmonella serotype prediction tool. High quality single nucleotide polymorphism (hqSNP) analyses were generated with Lyve-SET version 1.1.4f (github.com/lskatz/lyve-SET) using phage masking, Varscan, and an appropriate external (PacBio) or internal reference genome (MiSeq). The following parameters (--min_coverage 20; --min_alt_frac 0.95; --allowedFlanking 5) were used for more strict filtering, and (--min_coverage 10; --min_alt_frac 0.75; --allowedFlanking 5) for less strict filtering.
Standard hqSNP methodology demonstrated 0 hqSNP differences between identical MiSeq and PGM datasets among organisms with stricter filtering criteria and 0-5 hqSNP differences among organisms with less strict filtering. When strict filtering was applied to the low filtered organism, there were 0 observed hqSNP differences between the identical samples of MiSeq and PGM. Kraken and SeqSero correctly predicted all species/serotypes/serogroups.
The data generated by the PGM and MiSeq systems produced comparable results in downstream analyses for the same sample set and could therefore be used interchangeably in a network of laboratories, though further validation of the data with additional analysis methods is needed. Future goals include expanding the validation to other organisms tracked by PulseNet, additional data analysis using whole genome multi-locus sequence typing (wgMLST), and identification using Average Nucleotide Identity (ANI).
Sequencing strategies and technology advancements using the various NGS platforms , Comparative genomics, re-sequencing, SNPs, structural variation