Abstract
Single nucleotide polymorphism heritability of a trait is measured as the proportion of total variance explained by the additive effects of genome-wide single nucleotide polymorphisms. Linear mixed models are routinely used to estimate single nucleotide polymorphism heritability for many complex traits, which requires estimation of a genetic relationship matrix among individuals. Heritability is usually estimated by the restricted maximum likelihood or method of moments approaches such as Haseman-Elston regression. The common practice of accounting for such population substructure is to adjust for the top few principal components of the genetic relationship matrix as covariates in the linear mixed model. This can get computationally very intensive on large biobank-scale datasets. Here, we propose a method of moments approach for estimating single nucleotide polymorphism heritability in presence of population substructure. Our proposed method is computationally scalable on biobank datasets and gives an asymptotically unbiased estimate of heritability in presence of discrete substructures. It introduces the adjustments for population stratification in a second-order estimating equation. It allows these substructures to vary in their single nucleotide polymorphism allele frequencies and in their trait distributions (means and variances) while the heritability is assumed to be the same across these substructures. Through extensive simulation studies and the application on 7 quantitative traits in the UK Biobank cohort, we demonstrate that our proposed method performs well in the presence of population substructure and much more computationally efficient than existing approaches.</p>