: Publication 3280

Publication 3280

Title:	Chances and challenges of machine learning-based disease classification in genetic association studies illustrated on age-related macular degeneration
Journal:	Genetic Epidemiology
Published:	2 Aug 2020
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/32741009/
DOI:	https://doi.org/10.1002/gepi.22336
URL:	https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/gepi.22336
Citations:	13 (5 in last 2 years) as of 8 Aug 2024

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Abstract

Imaging technology and machine learning algorithms for disease classification set the stage for high-throughput phenotyping and promising new avenues for genome-wide association studies (GWAS). Despite emerging algorithms, there has been no successful application in GWAS so far. We establish machine learning-based phenotyping in genetic association analysis as misclassification problem. To evaluate chances and challenges, we performed a GWAS based on automatically classified age-related macular degeneration (AMD) in UK Biobank (images from 135,500 eyes; 68,400 persons). We quantified misclassification of automatically derived AMD in internal validation data (4,001 eyes; 2,013 persons) and developed a maximum likelihood approach (MLA) to account for it when estimating genetic association. We demonstrate that our MLA guards against bias and artifacts in simulation studies. By combining a GWAS on automatically derived AMD and our MLA in UK Biobank data, we were able to dissect true association (ARMS2/HTRA1, CFH) from artifacts (near HERC2) and identified eye color as associated with the misclassification. On this example, we provide a proof-of-concept that a GWAS using machine learning-derived disease classification yields relevant results and that misclassification needs to be considered in analysis. These findings generalize to other phenotypes and emphasize the utility of genetic data for understanding misclassification structure of machine learning algorithms.

12 Keywords

Algorithms
Diagnostic Errors
Genome-Wide Association Study
High-Temperature Requirement A Serine Peptidase 1
Humans
Likelihood Functions
Machine Learning
Macular Degeneration
Models, Genetic
Phenotype
Proteins
United Kingdom

7 Authors

Felix Guenther
Caroline Brandl
Thomas W. Winkler
Veronika Wanner
Klaus Stark
Helmut Kuechenhoff
Iris M. Heid

1 Application

Application ID	Title
33999	Identifying and quantifying risk factors for macular disorders using automated approaches to phenotyping

1 Return

Return ID	App ID	Description	Archive Date
3279	33999	Chances and challenges of machine learning-based disease classification in genetic association studies illustrated on age-related macular degeneration	6 Apr 2021

Enabling scientific discoveries that improve human health