Simulation-based Genotyping of Structural Variants in Whole Genome Sequencing Data, Poster 20
Abstract
Variants in the human genome range from small single nucleotide variants to large structural variants (SVs) that can span 100s to millions of nucleotides. SVs, which have been associated with numerous genetic diseases, are... [ view full abstract ]
Variants in the human genome range from small single nucleotide variants to large structural variants (SVs) that can span 100s to millions of nucleotides. SVs, which have been associated with numerous genetic diseases, are challenging to discover and genotype in next-generation sequencing (NGS) data. SVs are larger than the sequencer read length and so must be indirectly inferred from probabilistic signals in the sequencing data. In this project, we develop a non-parametric, simulation-based approach to SV genotyping that can account for variant, sample, genome region and pipeline-specific biases. We developed an automated pipeline for simulating NGS data with structural deletions of differing zygosity and extracting SV-relevant metrics. Using the simulated data we perform supervised learning to create a per-variant model for classifying zygosity. We can then apply this model to predict the zygosity of putative SV calls in the sample-of-interest. We aim to refine this prototype simulation pipeline into a practical tool for genotyping putative SVs in a cohort of children with congenital heart defects.
Authors
-
Crystal Paudyal '19
-
Michael Linderman
Topic Area
Science & Technology
Session
P1 » Poster Presentations: Group 1 and Refreshments (10:30am - Friday, 20th April, MBH Great Hall, 331 and 338)