Using Linked-Reads to enable efficient de novo, diploid assembly
Abstract
Assembly of mammalian genomes using cost effective, accurate short reads is challenging. Further, commonly used reference based approaches are limited to organisms that have a reference assembly and can be inadequate for... [ view full abstract ]
Assembly of mammalian genomes using cost effective, accurate short reads is challenging. Further, commonly used reference based approaches are limited to organisms that have a reference assembly and can be inadequate for determining complex structural rearrangements. We describe a novel approach for de novo assembly that utilizes our proprietary microfluidic system, the ChromiumTM controller, for high-throughput partitioning of high-molecular weight DNA (³50kb). Unique barcodes are applied within each partition, labeling individual DNA molecules and allowing for the retention of long-range information from short read sequencing, creating a novel data type called Linked-Reads. Importantly, this method requires only small amounts of input DNA (0.5- 1.25 ng) and a single sequencing library.
The Supernova™ Assembler takes advantage of Linked-Reads to perform de novo diploid assembly. The process begins by constructing a De Bruijn graph using the read data. The graph is then progressively refined using read-pair barcode information, which allows for the construction of multimegabase scaffolds. Heterozygosity within the sample coupled with molecular bar codes allows for the separation of scaffolds into their distinct haplotypes, referred to as phase-blocks. The reconstruction of individual haplotypes, rather than a haploid consensus sequence, allows for a more complete and accurate representation of the sample adding power to downstream analyses.
We demonstrate the performance of this method on several human genomes of diverse ethnic origin. The resulting assemblies consistently have scaffold lengths on the order of 15 Mb with phase block lengths varying from 4 -11 Mb. Assessment of assemblies via Benchmarking Universal Single-Copy Orthologs (BUSCO) shows completeness of conserved vertebrate genes at rates at >90%, comparable to the human reference genome, GRCh38. Further, as some of these samples are members of well characterized trios, we validated accuracy of the phase information using orthogonal data. We also evaluated performance on a variety of other genomes including sweat bee, hummingbird, dog, grape, and olive fly. Demonstrating the utility of this approach in non-human species, the resulting assemblies have scaffold N50 lengths ranging from 50kb to >100kb and phase block N50 lengths ranging from 0.5Mb to 10Mb.
In summary, using SupernovaTM we have shown the utility of Linked-Reads in de novo assembly of human and non-human species. These reconstructions yield high quality, phased assemblies which are enriched for conserved gene content.
Authors
-
Stephen Williams
(10x Genomics)
-
Claudia Catalanotti
(10x Genomics)
-
Jill Herschleb
(10x Genomics)
-
Vijay Kumar
(10x Genomics)
-
Preyas Shah
(10x Genomics)
-
Neil Weisenfeld
(10x Genomics)
-
Michael Schnall-levin
(10x Genomics)
-
David Jaffe
(10x Genomics)
-
Deanna Church
(10x Genomics)
Topic Areas
De novo sequencing, re-sequencing, Human seq., RNA seq., metagenomics, etc. , Whole genome assemblers and integration of next generation dataTopic #1 , De novo assemblers for short reads, hybrid assemblers
Session
TT-3 » Assembly & Analysis (16:05 - Thursday, 18th May, La Fonda Ballroom)
Presentation Files
The presenter has not uploaded any presentation files.