Introduction: The MinION Mk1B Nanopore Sequencer (MinION) from Oxford Nanopore Technologies (ONT) was assessed for use in human identification (HI) by short tandem repeat (STRs) profiling, single nucleotide polymorphisms (SNPs) profiling by whole genome sequencing (WGS) and mitochondrial DNA (mtDNA) haplotyping.
Methods: Human Genomic DNA Standard Reference Materials 2391C (SRM) A, B, and C (NIST) were amplified using the Promega PowerSeq 24 kit and the amplicons sequenced using the Ligation Sequencing Kit 2D (R9, ONT). STR profiles were analyzed by alignment to truth data using Burrows-Wheeler Aligner (BWA) and the integrated genome viewer (IGV). WGS was conducted using NS12911 and NA12878 as template. Each was sequenced using the Ligation Sequencing Kit 2D (R9.4, ONT) and the Rapid Sequencing Kit (R9.4, ONT) with SpotON flow cells. Sequencing metrics were obtained by poretools. WGS SNP profiles were analyzed by the Personal Identity Pipeline (PIP). Coverage of mtDNA in WGS data was assessed by alignment using BWA to the revised Cambridge Reference Sequence (rCRS). MtDNA amplicon sequencing was also conducted by amplifying the full mtDNA genome of NIST mtDNA standard kit 2392 Components A and B in two overlapping amplicons and sequencing with the Ligation Sequencing Kit 1D (R9.4, ONT). All samples were sequenced in triplicate. MtDNA data was aligned to the rCRS and SNPs identified using variant calling software. From the SNP profile of the mtDNA data, haplotypes were derived.
Results: STR amplicons were on average 342 bp, and 94.7% of reads aligned to truth data. The average insertion/deletion (indel) rate was 4.65%, inhibiting exact matching to be used to identify STRs. Therefore, BWA was used to align reads to known truth sequence data for each sample. The amplicon pool was highly imbalanced with coverage ranging from 16x (Amelogenin) to 241x (D7S820). IGV demonstrated that some amplicons, ex. D5S818, dropped out over the STR region, while others, ex. TH01, had even coverage throughout. Other errors observed included indels.
WGS coverage ranged from 5-35% of the human genome per run, with an average of 413,482 2D reads and 87,731 1D reads aligning. The average read length was 1,237 bp for 2D and 2,357 bp for 1D runs, however, error rate was higher for 1D (4.8%) than 2D (3.1%). SNP profile matching was performed using the PIP and results compared between 1D and 2D runs. The whole mtDNA sequence was covered on average 15-158x in WGS data; however, when enriched by PCR, the coverage increased to 5013-7116x.
Conclusions: MinION-derived data demonstrated that STR analysis is confounded by the indel rate, low coverage of targets, and loss of some STRs during sequencing. Further research is needed to determine the cause of drop out during sequencing, and develop bioinformatic tools to profile unknowns from MinION data. SNPs identified by WGS can be used for HI when compared to a database; however, a SNP profile database will take time to build. The long-read capability of the MinION enables better resolution of SNP linkages, haplotype determination and mixture deconvolution of mtDNA.
Analysis for metagenomics, antimicrobial resistance, and forensics , Human, non-human, and infectious disease applications , Gene editing, synthetic genomics, forensics, and biosurveillance