Introduction: Advances in genomic technologies have improve the speed and precision of foodborne disease outbreak detection and response. For the past two decades, pulsed field gel electrophoresis (PFGE) has been the method of choice for surveillance and outbreak investigation with foodborne pathogens. Currently, Illumina whole genome sequencing (WGS) is rapidly supplanting PFGE as the method of choice for these investigations. Compared to PFGE, WGS has both increased genomic resolution and scope. Currently, NCBI is the curator of the Pathogen Detection system that contains genomic sequence and metadata from pathogens in a searchable format for identifying strains that are closely related to one another. Strains are grouped into clusters based on their relatedness by core genome SNPs, however questions remain about the correlation between core SNP differences and strain relatedness without including the non-core regions of the genome. To address this, we sequenced Shiga toxin-containing Escherichia coli O157:H7 (STEC O157) strains by Illumina (draft genomes) and PacBio RS II (complete closed genomes) sequencing and compared their relatedness in the NCBI Pathogen Detection system.
Methods: Two groups of STEC O157 strains, restriction digest pattern 1 (RDP1; n=3) and RDP2 (n=2), that were indistinguishable by PFGE separation with three restriction enzymes were sequenced by Illumina and PacBio RS II. Celera Assembler 8.3 and HGAP3 from SMRTportal were used to generate de novo genomes from the PacBio generated sequences. The chromosome and plasmid/s were trimmed to remove overlapping ends and reoriented to the ori region using Geneious and OriFinder. A final polishing step using Pilon was used to generate the complete finished genome and Mauve was used to compare the de novo generated assemblies. The complete genomes along with the Illumina sequences were deposited into NCBI. The Pathogen Detection website (https://www.ncbi.nlm.nih.gov/pathogens/) was queried to determine the nearest related strains as measured by number of SNP differences in the core genome.
Results: With Illumina sequences, all RDP1 strains and one RDP2 strain didn’t cluster with any strains in the Pathogen Detection database, while the second RDP2 strain was grouped in a cluster with other strains. After generating complete closed genomes, all strains in RDP1 and RDP2 were grouped together with other strains. Two RDP1 strains were different by four SNPs. Comparison of their complete closed genomes revealed that besides the SNP differences, the strains differed by an additional non-core SNP, indels and transposon integrations.
Conclusion: For STEC O157 genome comparisons, highly related strains with few core genome SNP differences can have mobile elements, indels and non-core SNP differences not identified by core genome analysis. Consequently, core genome SNP analysis may not be adequate in all cases to determine genome relatedness for surveillance and outbreak investigations of foodborne pathogens.