Essential information > Accessing your data
Accessing your data
Accessing UK Biobank data guide
The following document provides guidance on how to download the various different type of UK Biobank data:
Accessing data guide
Please see the Understanding UK Biobank page for information about the data available, which also includes further information and reports about linkage data (Cancer, Hospital Inpatient, Death). Please note however that no new linkage data has been made available in the February 2020 Showcase Update.
See also Provisional timeline for future data availability for information about recent and future updates.
Other useful resourcesData Dictionary: List of Data-Codings:
List of bulk fields
List of records fields
List of genetic fields
The size of the core dataset
The core dataset consists of the categories shown in the Quick Start section of Showcase when a basket is first created:
As an illustration of the potential size of a downloaded UK Biobank main dataset, the sizes of the various files generated from the core dataset are given on the table below:
|File type||ukbconv option||Dataset size||File extension|
In addition: the R .tab file by a 511 KB .r script, the SAS .sd2 file is accompanied by a 1.3 MB .sas script, and the Stata .raw file by a 501 KB .do script and a 961 KB .dct file.
Note that the large difference in size between the tsv .txt file and the (also tab-separated) R .tab file is due to empty fields being represented by the empty string in the former and by NA in the latter. Similarly, all fields are quoted in the .csv file, with empty fields appearing as "", which accounts for its additional size compared to the .txt file.
Information about the sizes of bulk data items such as MRI images can be found in section 8.4 of the "Accessing data guide" above. This document also includes links to documents providing information about the size of the Genotype data (Section 4.1) and the Exome data (Section 4.3).
* Please note that there is currently a glitch with the tsv version of the converted file, in that every row except for the first starts with an extra tab (thereby throwing off the column alignment). If you intend to use this option you will need to have the technical know-how to manipulate the file to correct the problem. Another approach is simply to use the R option instead as that also produces a tab-separated file, and to disregard the accompanying R script. (Note that empty fields will appear as NA rather than the empty string in the resulting file however.) We are currently looking into correcting this problem.