Abstract
Genotype imputation is widely used in genetic studies to boost the power of GWAS, to combine multiple studies for meta-analysis and to perform fine mapping. With advances of imputation tools and large reference panels, genotype imputation has become mature and accurate. However, the uncertain nature of imputed genotypes can cause bias in the downstream analysis. Many studies have compared the performance of popular imputation approaches, but few investigated bias characteristics of downstream association analyses. Herein, we showed that the imputation accuracy is diminished if the real genotypes contain minor alleles. Although these genotypes are less common, which is particularly true for loci with low minor allele frequency, a large discordance between imputed and observed genotypes significantly inflated the association results, especially in data with a large portion of uncertain SNPs. The significant discordance of P-values happened as the P-value approached 0 or the imputation quality was poor. Although elimination of poorly imputed SNPs can remove false positive (FP) SNPs, it sacrificed, sometimes, more than 80% true positive (TP) SNPs. For top ranked SNPs, removing variants with moderate imputation quality cannot reduce the proportion of FP SNPs, and increasing sample size in reference panels did not greatly benefit the results as well. Additionally, samples with a balanced ratio between cases and controls can dramatically improve the number of TP SNPs observed in the imputation based GWAS. These results raise concerns about results from analysis of association studies when rare variants are studied, particularly when case-control studies are unbalanced.</p>