Abstract
The increasing availability of large-scale, population-based whole-genome sequencing (WGS) data enables comprehensive analyses of rare genetic variants, which are crucial for unraveling the genetic mechanisms underlying complex traits and diseases. Time-to-event traits offer the advantage of capturing both diagnosis status and timing, facilitating the identification of genetic variants associated with age of onset, disease progression, and lifespan. However, existing methods primarily focus on quantitative and binary traits, which do not leverage censored time information and have limitations in detecting rare variants associated with disease progression. Here we propose SurvSTAAR, a powerful and comprehensive statistical framework for time-to-event traits in large-scale WGS studies, offering a computationally scalable analytical pipeline for analyzing rare coding and noncoding variants. SurvSTAAR accounts for sample relatedness, population structure, heavily censored traits, and further empowers rare variant association analysis by incorporating functional annotations. We applied SurvSTAAR to analyze the time-to-event trait of Alzheimer's disease (AD) in 458,773 related samples from the UK Biobank WGS data. We identified putatively novel associations with AD in both coding and noncoding regions and further explored their potential role in disease progression by assessing their effects on protein function, including amino acid changes, structural modifications, and domain disruptions. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.</p>