: Publication 18482

Publication 18482

Title:	Large language models improve transferability of electronic health record-based predictions across countries and coding systems
Journal:	npj Digital Medicine
Published:	22 Jan 2026
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/41571946/
DOI:	https://doi.org/10.1038/s41746-026-02363-5

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Abstract

Variation in medical practices and reporting standards across healthcare systems limits the transferability of prediction models based on structured electronic health record data. Prior studies have demonstrated that embedding medical codes into a shared semantic space can help address these discrepancies, but real-world applications remain limited. Here, we show that leveraging embeddings from a large language model alongside a transformer-based prediction model provides an effective and scalable solution to enhance generalizability. We call this approach GRASP and apply it to predict the onset of 21 diseases and all-cause mortality in over one million individuals. Trained on the UK Biobank (UK) and evaluated in FinnGen (Finland) and Mount Sinai (USA), GRASP achieved an average ΔC-index that was 88% and 47% higher than language-unaware models, respectively. GRASP also showed significantly higher correlations with polygenic risk scores for 62% of diseases, and maintained robust performance even when datasets were not harmonized to the same data model.</p>

6 Authors

Matthias Kirchler
Matteo Ferro
Veronica Lorenzini
Robin P. van de Water
Christoph Lippert
Andrea Ganna

1 Application

Application ID	Title
77717	Learning disease characteristics from multi-modal data for precision medicine

Enabling scientific discoveries that improve human health