Abstract
BACKGROUND AND OBJECTIVES: Identification of individuals at high risk of developing Parkinson disease (PD) several years before diagnosis is crucial for developing treatments to prevent or delay neurodegeneration. This study aimed to develop predictive models for PD risk that combine plasma proteins and easily accessible clinical-demographic variables.</p>
METHODS: Using data from the UK Biobank (UKB), which recruited participants across the United Kingdom, we conducted a longitudinal study to identify predictors for incident PD. Participants with baseline plasma proteins and no PD were included. Through machine learning, we narrowed down predictors from a pool of 1,463 plasma proteins and 93 clinical-demographic. These predictors were then externally validated using the Parkinson's Progression Marker Initiative (PPMI) cohort. To further investigate the temporal trends of predictors, a nested case-control study was conducted within the UKB.</p>
RESULTS: A total of 52,503 participants without PD (median age 58, 54% female) were included. Over a median follow-up duration of 14.0 years, 751 individuals were diagnosed with PD (median age 65, 37% female). Using a forward selection approach, we selected a panel of 22 plasma proteins for optimal prediction. Using an ensemble tree-based Light Gradient Boosting Machine (LightGBM) algorithm, the model achieved an area under the receiver operating characteristic curve (AUC) of 0.800 (95% CI 0.785-0.815). The LightGBM prediction model integrating both plasma proteins and clinical-demographic variables demonstrated enhanced predictive accuracy, with an AUC of 0.832 (95% CI 0.815-0.849). Key predictors identified included age, years of education, history of traumatic brain injury, and serum creatinine. The incorporation of 11 plasma proteins (neurofilament light, integrin subunit alpha V, hematopoietic PGD synthase, histamine N-methyltransferase, tubulin polymerization promoting protein family member 3, ectodysplasin A2 receptor, Latexin, interleukin-13 receptor subunit alpha-1, BAG family molecular chaperone regulator 3, tryptophanyl-TRNA synthetase, and secretogranin-2) augmented the model's predictive accuracy. External validation in the PPMI cohort confirmed the model's reliability, producing an AUC of 0.810 (95% CI 0.740-0.873). Notably, alterations in these predictors were detectable several years before the diagnosis of PD.</p>
DISCUSSION: Our findings support the potential utility of a machine learning-based model integrating clinical-demographic variables with plasma proteins to identify individuals at high risk for PD within the general population. Although these predictors have been validated by PPMI, additional validation in a more diverse population reflective of the general community is essential.</p>