Alcohol and tobacco use and dependence are moderately heritable phenotypes (Verhulst et al., 2015) with complex genetic architecture. The majority of observed associations for common variants have been located outside of genes (Finucane et al., 2015; Maurano et al., 2012), such as regulatory regions marked by open chromatin (i.e., DNase I hypersensitive sites) and regions susceptible to epigenetic processes like DNA methylation (i.e., CpG islands). The few studies relating low-frequency and rare variants to alcohol and tobacco use, in contrast, have been restricted to regions that code for proteins (Otto et al., 2017; Vrieze et al., 2014), given their putatively functional role and the high costs associated with sequencing. However, these studies have yielded limited success in identifying significant associations, and top variants have very small effect. Less is known about the effects of rare non-coding regulatory variation, despite the high proportion of heritability thought to be explained by the effects of common variants in these regions. Given the large number of rare variants located outside of genes, and a less than complete understanding of their function, the current study aims to evaluate methods of prioritizing and grouping rare sequence variation within gene and non-coding regulatory regions in studies of alcohol and tobacco use and dependence. Low-coverage whole genome sequencing data was obtained from 1,889 individuals as part of the UCSF Family Study of alcohol dependence (Vieten et al., 2004). The Combined Annotation-Dependent Depletion (CADD; Kircher et al., 2014) bioinformatics tool was used to compute a single measure of deleteriousness (i.e., scaled C-scores) for variants with minor allele frequency (MAF) < 5% located within independent CpG islands and a 2kb surrounding “shore” region of DNA (N = 17,304 CpG islands). Set-based tests of rare CpG island/shore variants with alcohol and tobacco use phenotypes were conducted using the SKAT-O test (Lee et al., 2012), applying C-scores as variant weights within a set to allow for adjustments based on the relative deleteriousness of each variant. After correction for multiple testing, a limited number of rare variant CpG island sets were significantly associated with alcohol and tobacco use phenotypes. A subset of the top associations were located near genes previously identified in molecular genetic and epigenetic studies of these traits. These included a suggestive association of average cigarettes smoked per day with rare variants in a CpG island on chromosome 13, the latter of which overlapped with a loci identified as a differentially methylated region in previous studies of smoking phenotypes (Allione et al., 2016; Ambatipudi et al., 2016). Results from genome-wide hypothesis-free tests of rare variant sets using a sliding genomic window approach will be used to validate the possible enrichment of association signals from these regions susceptible to DNA methylation. Pre-processing and quality control of DNA methylation is currently being conducted in order to test for differential DNA methylation and potential epigenetic mediation of DNA sequence variation in the context of these traits. Future work will extend these approaches to other types of regulatory elements and in replication samples.
Gene Finding Strategies , Substance use: Alcohol, Nicotine, Drugs