Abstract
Abstract In the era of big data, secondary outcomes have become increasingly important alongside primary outcomes. These secondary outcomes, which can be derived from traditional endpoints in clinical trials, compound measures, or risk prediction scores, hold the potential to enhance the analysis of primary outcomes. Our method is motivated by the challenge of utilizing multiple secondary outcomes, such as blood biochemistry markers and urine assays, to improve the analysis of the primary outcome related to liver health. Current integration methods often fall short, as they impose strong model assumptions or require prior knowledge to construct over-identified working functions. This article addresses these challenges and opens a new avenue in data integration by introducing a novel integrative learning framework applicable in a general setting. The proposed framework allows for the robust, data-driven integration of information from multiple secondary outcomes, promotes the development of efficient learning algorithms, and ensures optimal use of available data. Extensive simulation studies demonstrate that the proposed method significantly reduces variance in primary outcome analysis, outperforming existing integration approaches. Additionally, applying this method to UK Biobank reveals that cigarette smoking is associated with increased fatty liver measures, with these effects being particularly pronounced in the older adult cohort.</p>