The UK Biobank project is a large prospective cohort study of ~500,000
individuals from across the United Kingdom, aged between 40-69 at
recruitment. A rich variety of phenotypic and health-related
information is available on each participant, making the resource
unprecedented in its size and scope. Here we describe the genome-wide
genotype data (~805,000 markers) collected on all individuals in the
cohort and its quality control procedures. Genotype data on this scale
offers novel opportunities for assessing quality issues, although the
wide range of ancestries of the individuals in the cohort also creates
particular challenges. We also conducted a set of analyses that reveal
properties of the genetic data - such as population structure and
relatedness - that can be important for downstream analyses. In
addition, we phased and imputed genotypes into the dataset, using
computationally efficient methods combined with the Haplotype Reference
Consortium (HRC) and UK10K haplotype resource. This increases the
number of testable variants by over 100-fold to ~96 million variants. We
also imputed classical allelic variation at 11 human leukocyte antigen
(HLA) genes, and as a quality control check of this imputation, we
replicate signals of known associations between HLA alleles and many
common diseases. We describe tools that allow efficient genome-wide
association studies (GWAS) of multiple traits and fast phenome-wide
association studies (PheWAS), which work together with a new compressed
file format that has been used to distribute the dataset. As a further
check of the genotyped and imputed datasets, we performed a test-case
genome-wide association scan on a well-studied human trait, standing
height.