Notes
The genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test statistics and hence to spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we develop an MLM-based tool (fastGWA) that controls for population stratification by principal components and for relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrate by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then apply fastGWA to 2,173 traits on array-genotyped and imputed samples from 456,422 individuals and to 2,048 traits on whole-exome-sequenced samples from 46,191 individuals in the UKB.
Application 12514
The limits of predicting complex traits and diseases from genetic data
Results from genome-wide association studies (GWAS) have proven valuable for understanding the genetic architecture of complex traits and are potentially valuable for predicting disease risk. As GWAS sample sizes grow the prediction accuracy will increase and may eventually yield clinically actionable predictions, for example by stratifying individuals on risk. One limitation for making accurate disease risk prediction is the experimental sample size. We aim to quantify the limits of predicting disease risk for an individual by developing sophisticated statistical methods and applying them to quantitative traits in the large UK Biobank sample. Understanding of the limitations of predicting an individual?s risk of disease using genetic data is of great importance for disease prevention, and meets the UK Biobank?s stated purposes. Gaining accurate genetic risk predictors through the development of robust and powerful statistical methods, together with a large discovery sample (e.g. UK Biobank data), is critical for use in disease screening programs to stratify the population, which is expected to reduce the financial burden of the health system for the whole society. Through a focus on quantitative phenotypes, we will develop new approaches applicable to predicting disease risk. The genetic marker data will be used to estimate genome-wide relationships, which we will then correlate with phenotype. This analysis will simultaneously quantify how much of the observed individual differences in phenotype is due to genetic factors, and how accurate a genetic predictor can be. The accuracy of prediction will then be tested. We focus on well-characterised quantitative phenotypes of height, body mass index, blood pressure, osteoporosis, and metabolism. To have maximum power to predict risk of disease, we require access to the full cohort, because one of the main limiting factors of prediction is sample size. Our analyses will thus require individual-level imputed genotype and phenotype data. We request a wide range of phenotypes because prediction accuracy is sensitive to the underlying genetic architecture and we wish to quantify the limits of prediction across multiple diseases.
Lead investigator: | Professor Peter Visscher |
Lead institution: | University of Queensland |
7 related Returns
Return ID | App ID | Description | Archive Date |
3084 | 12514 | Causal associations between risk factors and common diseases inferred from GWAS summary data | 16 Dec 2020 |
1822 | 12514 | Genetic evidence of assortative mating in humans | 20 Nov 2019 |
3452 | 12514 | Genome-wide association study of medication-use and associated disease in the UK Biobank | 25 May 2021 |
3621 | 12514 | Improved polygenic prediction by Bayesian multiple regression on summary statistics | 2 Jul 2021 |
3074 | 12514 | Meta-analysis of genome-wide association studies for height and body mass index in ~700000 individuals of European ancestry | 16 Dec 2020 |
3060 | 12514 | Misestimation of heritability and prediction accuracy of male-pattern baldness | 14 Dec 2020 |
3041 | 12514 | Transformation of Summary Statistics from Linear Mixed Model Association on All-or-None Traits to Odds Ratio | 8 Dec 2020 |