Name: Genotype imputation and genetic association studies using UK Biobank data

This document describes the analysis carried out to perform genotype imputation for the interim release of the UK Biobank genotype data. It also provides advice about using the imputed data to carry out genome wide association studies (GWAS) or for extracting genotypes for use as covariates in other types of association study.

Genotype imputation is the process of predicting genotypes that are not directly assayed in a sample of individuals. A reference panel of haplotypes at a dense set of SNPs, indels and structural variants, is used to impute genotypes into a study sample of individuals that have been genotyped at a subset of the SNPs. These 'in-silico' genotypes can then be used to boost the number of SNPs that can be tested for association. This increases the power of the study, the ability to resolve or fine-map the causal variants and facilitates meta-analysis. The result of the imputation process is a dataset with 73,355,667 SNPs, short indels and large structural variants in 152,249 individuals. The process of imputation is divided into two steps

  1. pre-phasing;
  2. imputation.
In the first step, the samples to be imputed are 'pre-phased' i.e a statistical method is applied to genotype data to infer the underlying haplotypes of each individual. In the second step, a different statistical method is used to combine the inferred haplotypes with a reference panel of haplotypes and impute the unobserved genotypes in each sample. Phasing and imputation can be a computationally intensive process. To avoid many different research groups having to carry this out independently, phasing and imputation was carried out centrally at the Welcome Trust Centre for Human Gentics in Oxford.

