Building High-quality, De Novo Genome Assemblies by Scaffolding Next-Generation Sequencing Assemblies with Bionano's Next- Generation Mapping
Abstract
Combining next-generation sequencing (NGS) data and next-generation mapping (NGM) data from Bionano Genomics provides a solution that is being adopted to produce affordable, high-quality and chromosome-scale de novo genome... [ view full abstract ]
Combining next-generation sequencing (NGS) data and next-generation mapping (NGM) data from Bionano Genomics provides a solution that is being adopted to produce affordable, high-quality and chromosome-scale de novo genome assemblies.
We describe a novel workflow that utilizes two nicking endonucleases to increase the information specificity and improve contiguity through tiling of sequence contigs and genome maps. We generated two sets of genome maps, each with a different nicking endonuclease and developed novel algorithms that used the NGS sequences as a bridge to merge single-enzyme genome maps into combined maps that contain the sequence motif patterns from both nicking enzymes. Since the genome maps were generated independently, they served as orthogonal sources of evidence to detect and correct assembly errors. The complementarity of the input datasets also greatly improved the contiguity of the hybrid scaffold while doubling the information density, which substantially improved our ability to anchor short NGS sequences in the final scaffold.
We first validated our approach using the human NA12878 genome. Starting with NGS assemblies with N50 ranging from 0.18 – 0.9 Mbp, we produced hybrid assemblies with N50 from 18 to 38 Mbp and incorporated 80-90% of total sequences with over 99% accuracy. Compared to previously published single-enzyme hybrid scaffolds, the two-enzyme approach improved the scaffold contiguity by 300% and anchored up to 30% more sequence contigs while correcting 50% more assembly errors in NGS sequences. We furthermore demonstrated that the pipeline is compatible with data from different sequencing technologies and performs well across human, animal and plant genomes. This NGM-based approach makes assembling large complex genomes cost-effective. With Bionano’s new Saphyr platform, the two-enzyme data for a human genome can be collected in 24 hours with only 1 Saphyr chip and relatively low reagent costs.
This new approach can greatly expand the type of NGS data that can be integrated with Bionano genome maps to produce highly accurate and contiguous assemblies for complex genomes.
Authors
-
Jian Want
(Bionano Genomics)
Topic Areas
Sequencing strategies and technology advancements using the various NGS platforms , De novo sequencing, re-sequencing, Human seq., RNA seq., metagenomics, etc.
Session
TT-3 » Assembly & Analysis (16:05 - Thursday, 18th May, La Fonda Ballroom)
Presentation Files
The presenter has not uploaded any presentation files.