WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.
Abstract
Large biobank repositories of clinical conditions and medications data open opportunities to investigate the phenotypic disease network. We present a graph embedded topic model (GETM). We integrate existing biomedical knowledge graph information in the form of pre-trained graph embedding into the embedded topic model. Via a variational autoencoder framework, we infer patient phenotypic mixture by modeling multi-modal discrete patient medical records. We applied GETM to UK Biobank (UKB) self-reported clinical phenotype data, which contains 443 self-reported medical conditions and 802 medications for 457,461 individuals. Compared to existing methods, GETM demonstrates good imputation performance. With a more focused application on characterizing pain phenotypes, we observe that GETM-inferred phenotypes not only accurately predict the status of chronic musculoskeletal (CMK) pain but also reveal known pain-related topics. Intriguingly, medications and conditions in the cardiovascular category are enriched among the most predictive topics of chronic pain.