Improving genome analysis with phased de novo assemblies
Abstract
High-throughput sequencing has revolutionized genome analysis. Tens of thousands of genomes and hundreds of thousands of exomes have been analyzed globally allowing for new biological insights at both population and individual... [ view full abstract ]
High-throughput sequencing has revolutionized genome analysis. Tens of thousands of genomes and hundreds of thousands of exomes have been analyzed globally allowing for new biological insights at both population and individual levels. Despite these advances, it has become increasingly clear that traditional short read methods are insufficient for reconstructing individual genomes. These approaches lack long range information needed to reconstruct individual haplotypes and uncover complex structural variation.
To address this problem, we partition limiting amounts of high molecular weight DNA such that unique bar codes can be added as part of library generation. This approach allows us to couple long-range information with high-throughput, accurate short read sequencing, generating a data type known as Linked-Reads. We have developed informatic pipelines that utilize Linked-Read information for both reference based analysis and de novo assembly. Our reference based pipeline, Long Ranger, provides long range haplotype information on individual genomes and provides access to parts of the genome typically inaccessible to short reads. Additionally, Linked-Reads provide the power to identify a wide range of balanced and unbalanced structural events.
We have also developed software, Supernova, that enables de novo assembly from Linked-Reads. We have recently demonstrated the ability of this software to produce long, phased scaffolds from a single library (Weisenfeld et al., 2016). Supernova assemblies provide a method for vastly improved analysis of genomic structural variation. Aligning phased Supernova scaffolds to a reference provides a comprehensive view of the variation present in a sample with base pair accuracy.
Using truth samples, we demonstrate the ability to call and haplotype challenging variants including long insertions, heterozygous deletions, and SNPs on distinct paralogous loci. The low cost, simplicity and power of our approach suggests its applicability to a wide range of genomes including those for which a reference assembly is not available.
Authors
-
Patrick Marks
(10x Genomics)
Topic Areas
Sequencing strategies and technology advancements using the various NGS platforms , De novo sequencing, re-sequencing, Human seq., RNA seq., metagenomics, etc.
Session
TT-3 » Assembly & Analysis (16:05 - Thursday, 18th May, La Fonda Ballroom)
Presentation Files
The presenter has not uploaded any presentation files.