Title: | Fast analysis of biobank-size data and meta-analysis using the BGLR R-package |
Journal: | G3: Genes, Genomes, Genetics |
Published: | 9 Dec 2024 |
Pubmed: | https://pubmed.ncbi.nlm.nih.gov/39657738/ |
DOI: | https://doi.org/10.1093/g3journal/jkae288 |
Title: | Fast analysis of biobank-size data and meta-analysis using the BGLR R-package |
Journal: | G3: Genes, Genomes, Genetics |
Published: | 9 Dec 2024 |
Pubmed: | https://pubmed.ncbi.nlm.nih.gov/39657738/ |
DOI: | https://doi.org/10.1093/g3journal/jkae288 |
WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.
Analyzing human genomic data from biobanks and large-scale genetic evaluations often requires fitting models with a sample size exceeding the number of DNA markers used (n>p). For instance, developing polygenic scores for humans and genomic prediction for genetic evaluations of agricultural species may require fitting models involving a few thousand SNPs using data with hundreds of thousands of samples. In such cases, computations based on sufficient statistics are more efficient than those based on individual genotype-phenotype data. Additionally, software that admits sufficient statistics as inputs can be used to analyze data from multiple sources jointly without the need to share individual genotype-phenotype data. Therefore, we developed functionality within the BGLR R-package that generates posterior samples for Bayesian shrinkage and variable selection models from sufficient statistics. In this article, we present an overview of the new methods incorporated in the BGLR R-package, demonstrate the use of the new software through simple examples, provide several computational benchmarks, and present a real-data example using data from the UK-Biobank, All of Us, and the Hispanic Community Health Study/Study of Latinos cohort demonstrating how a joint analysis from multiple cohorts can be implemented without sharing individual genotype-phenotype data, and how a combined analysis can improve the prediction accuracy of polygenic scores for Hispanics-a group severely under-represented in genome-wide association studies data.</p>
Enabling scientific discoveries that improve human health