WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.
KnockoffZoom is a flexible statistical method that localizes causal variants at multiple resolutions by testing the conditional associations of genetic segments of decreasing width, while provably controlling the false discovery rate. Our method utilizes artificial genotypes as negative controls and is equally valid for quantitative and binary phenotypes, without requiring any assumptions about their genetic architectures. Instead, we rely on well-established genetic models of linkage disequilibrium. We demonstrate that our method can detect more associations than mixed effects models and achieve fine-mapping precision, at comparable computational cost. Lastly, we apply KnockoffZoom to data from 350k subjects in the UK Biobank and report many new findings. Our findings can be downloaded or explored interactively at https://msesia.github.io/knockoffzoom/ukbiobank.html
Statistical Methods for Large Scale Genetic Studies
Our goal is to develop new data analysis methods that are well suited to discover the many genetic signals that influence traits of medical relevance. We aim to increase the sensitivity of current tools, by accounting for the known complexity: it is likely that many different genetic variants contribute to the traits, possibly interacting with each other, and our models capitalize on this. At the same time, we want to minimize the number of false positives results, which are unfortunately quite likely when one searches for possible associations among as many possibilities as those in genomewide studies of multiple traits. The UK Biobank data has one of the largest sample sizes in genetics data and to take fully advantage of this new data analysis methods are needed. Approaches with increased sensitivity and specificity in genetic association studies will facilitate the identification of the biological pathways perturbed in diseases. They will allow us to zoom in more precisely on the important biology?identifying relevant genes even when their effects are small, while avoiding false leads. This knowledge is important for risk assessment, therapy choices, and drug development. We will use the UK Biobank data to identify the concrete challenges presented by the analysis of large datasets and to test the performance of the methods that we will develop, relying both on simulations and on comparative data analysis.
We will use the genotype data to generate artificial traits with known genetic architecture and evaluate the performance of different methods in recovering it. We will also use measured traits to understand what type of genetic architecture is likely to be important for medical relevant phenotypes. Because our focus is on the development of methods applicable to large samples, taking advantage of the more detail information they contain, we are interested in working with the full cohort.