WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.
Abstract
Imaging technology and machine learning algorithms for disease classification set the stage for high-throughput phenotyping and promising new avenues for genome-wide association studies (GWAS). To evaluate chances and challenges of utilizing a machine learning based disease classification in GWAS, we performed a study on fundus-image derived AMD in UK Biobank: we automatically classified fundus images based on a published neural network ensemble (images from 135,500 eyes; 68,400 persons) and performed a GWAS utilizing the derived any AMD phenotype. Predictions of machine learning algorithms can be erroneous and we quantified misclassification of automatically derived AMD in internal validation data with an additional manual AMD classification (gold standard; 4,001 eyes; 2,013 persons). We establish the utilization of a machine learning based phenotype in genetic association analyses as misclassification problem and developed a maximum likelihood approach (MLA) to account for misclassification when estimating genetic association. By combining a GWAS on automatically derived AMD and our MLA, we were able to dissect true association (ARMS2/HTRA1, CFH) from artefacts (near HERC2) and identified eye color as associated with the misclassification. On this example, we provide a proof-of-concept that a GWAS using machine learning derived disease classification yields relevant results and that misclassification needs to be considered in analysis. These findings generalize to other phenotypes and emphasize the utility of genetic data for understanding misclassification structure of machine learning algorithms.