Abstract
Omics studies produce a large number of measurements, enabling the development, validation and interpretation of systems-level biological models. Large cohorts are required to power these complex models; yet, the cohort size remains limited due to clinical and budgetary constraints. We introduce clinical and omics multimodal analysis enhanced with transfer learning (COMET), a machine learning framework that incorporates large, observational electronic health record databases and transfer learning to improve the analysis of small datasets from omics studies. By pretraining on electronic health record data and adaptively blending both early and late fusion strategies, COMET overcomes the limitations of existing multimodal machine learning methods. Using two independent datasets, we showed that COMET improved the predictive modelling performance and biological discovery compared with the analysis of omics data with traditional methods. By incorporating electronic health record data into omics analyses, COMET enables more precise patient classifications, beyond the simplistic binary reduction to cases and controls. This framework can be broadly applied to the analysis of multimodal omics studies and reveals more powerful biological insights from limited cohort sizes.</p>