Abstract
Structural variants (SVs) contribute significantly to genetic diversity yet present computational challenges during analysis. We introduce SDFA, a standardized decomposition format and toolkit for efficient analysis of SVs in large-scale population genomics. SDFA efficiently stores and retrieves all SV types while providing algorithms for consistent SV merging, memory-efficient annotation, and precise gene feature annotation across large cohorts. SDFA outperforms existing tools, achieving at least 17.64 times faster merging than four tools and 120.93 times faster annotation than three tools, and uniquely handles complex SVs. We validate SDFA on 895,054 SVs from 150,119 individuals in the UK Biobank dataset.</p>