Abstract
Many datasets, including widely used biobanks, have more than one observation of numerous phenotypes for at least a portion of their sample. The majority of GWAS utilize only a single observation per individual, even when more than one observation may be available, and apply a standard model in which the additive allelic effect being estimated is assumed to be constant across the age or time range in the sample. Here, we test a set of simple approaches to utilize multiple observations per individual, under this same assumption, to characterize effects on GWAS power, SNP-heritability, gene set enrichment, and polygenic prediction. We find that utilizing the mean or median of the available observations rather than a single observation improves power to detect associated loci and enriched gene sets and yields higher out-of-sample polygenic score prediction accuracy. Despite growing biobanks, many deeply phenotyped samples are relatively small but have multiple observations. While explicitly modeling age- or time-dependent genetic effects can add nuance to genetic studies and estimates, most GWAS apply a standard, additive-only model; a simple approach of using the mean or median can improve power by reducing "noise" in the phenotype, utilize standard, optimized software, and be particularly impactful for smaller samples, including samples of diverse genetic ancestry currently existing in widely used biobanks such as the UK Biobank and HRS.</p>