: Return 3623

Return 3623

Application:	43206, Statistical methods for large-scale genomic analysis
Title:	Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations
Size:	1.3 MB
Cost Tier:	3
Archived:	2 Jul 2021
Stability:	Complete
Personal:	Contains individual-level data

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Notes

Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of analyses. We develop FastSMC, an IBD detection algorithm that combines a fast heuristic search with accurate coalescent-based likelihood calculations. FastSMC enables biobank-scale detection and dating of IBD segments within several thousands of years in the past. We apply FastSMC to 487,409 UK Biobank samples and detect ~214 billion IBD segments transmitted by shared ancestors within the past 1500 years, obtaining a fine-grained picture of genetic relatedness in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the use of genomic data to localize a sample s birth coordinates with a median error of 45 km. We seek evidence of recent positive selection by identifying loci with unusually strong shared ancestry and detect 12 genome-wide significant signals. We devise an IBD-based test for association between phenotype and ultra-rare loss-of-function variation, identifying 29 association signals in 7 blood-related traits.

Application 43206

Statistical methods for large-scale genomic analysis

We will develop new statistical and computational methods to enable the analysis of very large data sets containing genomic, environmental, and health-related information. We will study several properties of the data, such as the extent to which a group of individuals are genetically related, and use this information to develop new computational strategies and improve a number of analyses that aim at studying past evolutionary events (e.g. detecting evidence for natural selection), improve the detection of trait- and disease-associated regions of the genome (e.g. via analyses such as genotype imputation, haplotype phasing, GWAS), studying genetic, phenotypic, and environmental variation (e.g. understanding the interplay between genes and environment, quantifying heritability), and producing new evolutionary and functional genomic annotations (e.g. predicting whether a genomic region is involved in certain biological processes). The data in the UK Biobank will enable us to develop and test new computational methods, and to apply them in these analyses. We will analyze the full cohort and a wide range of phenotypes, including diseases (e.g. type 2 diabetes) and quantitative traits (e.g. height, BMI). These new methods and analyses will improve our ability to process genomic, environmental, and health-related data, and are aimed at providing a better understanding human evolution, biological process, and the causes of disease.

Lead investigator:	Dr Pier Francesco Palamara
Lead institution:	University of Oxford

1 Publication

Pub ID	Title	Author(s)	Year	Journal
3624	Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations	Juba Nait Saada (+6)	2020	Nature Communications

Enabling scientific discoveries that improve human health