Abstract
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals, underscoring a critical gap in genetic research. Prediction models trained primarily on European ancestry often fail to generalize to diverse populations, leading to reduced accuracy and potential health disparities. Here, we assess whether incorporating interaction modeling and pretraining into disease prediction models can improve performance. We evaluated the performance of Group-LASSO INTERaction-NET (glinternet) and pretrained lasso in disease prediction focusing on diverse ancestries in the UK Biobank. Models were trained on multiomic data from White British and other ancestries and validated in a cohort of more than 96,000 individuals for 8 diseases. Of the 96 trained models, we report 16 with statistically significant incremental predictive performance in terms of ROC-AUC scores ([Formula: see text]), found for diabetes, arthritis, gall stones, cystitis, asthma, and osteoarthritis. Our findings suggest that interaction terms and pre-training can modestly improve prediction accuracy, but these effects are not consistent across all diseases. Our code is available at (https://github.com/rivas-lab/AncestryOmicsUKB).</p>