There remain a considerable number of unsolved cases of rare disorders. Generally, current methods focus on smaller genetic changes (SNVs and indels). Accurately detecting structural genetic changes (SVs) that are de novo in a proband could reduce the proportion of unsolved cases. Currently, SV callers can exhibit upwards of a 50% false discovery rate (English et al, 2015) with some ambiguity in the breakpoint coordinates of the call. This makes it difficult to compare calls directly between the proband to the parents to accurately detect de novo variants.
Here, we detect de novo variants using the Biograph Analysis Format (BAF), a method of indexing NGS data that is reference agnostic and rapidly queryable. To show consistency of calls using this Format, we confirm SVs identified by Pindel using overlap assembly in an Ashkenazi Jewish Trio. Of 1,195 calls that showed evidence in the reads of at least one individual, all except for 25 (2.1%) were consistent with mendelian inheritance. In all cases that the variant was called in multiple samples, the variant was reported exactly the same. For variants that did not follow mendelian inheritance, there was insufficient coverage in regions flanking the breakpoint in at least one individual.
In a further case where the proband and both parents genomes were sequenced at 30x coverage using an Illumina HiSeq, we used Anchored Assembly on the proband sample, a caller that uses read overlap assembly to detect variants (English et al, 2015). We called 2,383 genetic changes that were either an insertion or deletion ( > 50 base pairs) in the proband. To confirm whether there was evidence of each of these variants in the parents, we search for the count of reads that contain the 50mer that is unique to the breakpoint junction. Of the 2,383 variants, 98 showed, prima facie, evidence of being de novo. 18 of these variants were heterozygous in the proband. Although there was no evidence for the variant in either parent, at least one parent showed a drop in coverage. Of the remaining 80 variants that were homozygous in the proband, the variant was present in at least one parent with the other parent either having no coverage at that location (68 variants) or low reference (12 variants, < 9 reads). Overall, this is suggestive that these variants are more likely to be due to a lack of coverage than true de novo variants.
This analysis was completed in 18 hours. Of that, 14 hours was to compute the format (from FASTQ), 3.5 hours was SVs variants (single threaded). In this case, it was not possible to confirm a true de novo SV. However, it is possible to rapidly detect and rule out these variants with this method due to both the low false discovery rate of the variant caller and the ability to directly and rapidly search the read data for evidence of either the SV or the reference allele. This method can be used to identify differences in edited genes.
Comparative genomics, re-sequencing, SNPs, structural variation , Bringing sequence to the clinic (i.e., diagnostics, cancer, inherited disorders) , Gene editing, synthetic genomics, forensics, and biosurveillance