DescriptionGenome-wide genetic data is available for 488,000 UK Biobank participants.
Genotype calling was performed by Affymetrix (now part of ThermoFisher Scientific) on two closely related purpose-designed arrays. ~50,000 participants were run on the UK BiLEVE Axiom array (Resource 149600) and the remaining ~450,000 were run on the UK Biobank Axiom array (Resource 149601). The dataset combines results from both arrays (see Field 22000) and there are 805,426 markers in the released genotype data. The positions of markers in the data are in GRCh37 coordinates. It was not possible to assay genotypes for some participants (~3%) as sufficient DNA could not be extracted from their blood samples.
The genotype data were quality controlled (QC). In addition the dataset was phased and ~96M genotypes were imputed using computationally efficient methods combined with the Haplotype Reference Consortium and UK10K haplotype resources. Classical allelic variations at eleven HLA genes were imputed. Information from the QC pipeline, such as array, and important genetic properties of the data such as population structure and relatedness are available.
Details of these analyses, and the methods used to derive other data such as imputation and haplotypes, is given in:
Bycroft et al, "Genome-wide genetic data on ~500,000 UK Biobank participants", bioRxiv 166298, 2017, which Researchers using this data are asked to cite (Resource 530).
The types of genetic data available are
- SNP and Sample QC information
- Imputed Genotypes
- Phased haplotypes
- Imputed classical HLA types
- Genotype intensities
- Genotype confidences
- Genotype SNP-posteriors
- Genotype B-allele-frequency and Log2Ratio
- Array definition files
There is currently a problem with part of the Imputed Genotypes data (all other types are believed fully correct). Please see Category 100319 for details.
Please note that most of the fields in Category 100315, Category 100319 and Category 100035 are indicators showing the availability of data. Researchers must use the ukbgene client (available from the Downloads section, instructions for use in Resource 664) to access the actual underlying genomic data or download them from the European Genome-phenome Archive (EGA).
Questions about using the genotype information can be directed to a special UK Biobank mailing list, which can be joined at https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=UKB-GENETICS.
Genomic results returned by researchers are available under Category 9001. This section is expected to expand considerably later in the year.