Imputation - Genomics
DescriptionImputed genotype and phased haplotype values. Genotypes were imputed into the dataset using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increased the number of testable variants over 100-fold to ~96 million variants, which are stored in the compressed and indexed BGENv1.2 format. The imputed genotypes are aligned to the + strand of the reference and the positions are in GRCh37 coordinates.
The fields listed here are indicators and adding them to an Application basket will allow researchers to download the corresponding underlying information from the UKB repository. This information includes
- Imputation (2.1TB)
- Haplotypes (0.06TB)
The lists of SNPs in the imputed datasets can be downloaded from the Field's Resources tabs on a per-chromosome basis or as a combined tars in Resource 1965 and Resource 1671. The information scores and minor allele frequency data for the imputed genotypes (computed with QCTOOL) can also be downloaded in Resource 1967.
Questions about using the imputed genotypes can be directed to a special UK Biobank mailing list, which can be joined at https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=UKB-GENETICS.
Please note that imputation results for Chromosomes X and XY are under preparation but are not yet available.
See Resource 530 for additional information (such as the quality control) including a citable reference for publications.
Please note: there is currently a problem with the UK Biobank imputed data. Specifically:
The genetic data was imputed using two different reference panels. The Haplotype Reference Consortium (HRC) panel was used wherever possible, but for SNPs not in that reference panel the UK10K + 1000 Genomes panel was used. The problem arose in the second set of imputed data from the UK10K + 1000 Genomes panel. The genotypes at these SNPs are imputed correctly, but they have not been recorded as having the correct genome position in the files.
The imputed data from the HRC panel is not affected and has the correct positions. This is about ~40M sites and includes the majority of the common SNPs i.e. sites most likely to show genetic associations. These sites can be identified using the publicly available HRC site list at http://www.haplotype-reference-consortium.org/site.
The data from the UK10K + 1000 Genomes panel is currently being re-imputed and new versions of the BGEN, BGI and MAF+INFO files will be released as soon as possible. Until this new release is available it is recommended that researchers use only the SNPs in the HRC panel, or work with the directly genotyped data. Additional details and timelines will be released via the mailing list.