To use the UK Biobank secure online repository services a researcher must
This webpage details the means by which bulk data held by UK Biobank can be accessed and manipulated once access has been approved.
Some of the UKB utilities are supplied pre-compiled for both MS-Windows and Linux systems. The MS-Windows utilities have the suffix .exe however the explanations given in this guide omit this for generality. All the utility programs are command-line, so Windows versions are best run from a Command Prompt window, and Linux versions are best run directly from a Terminal.
The repository consists of a pair of mirrored systems each connected to the UK JANET network by independent links. The system names are:
To analyse a particular bulk data file, a copy must be retrieved from the repository using the ukbfetch utility. This program can be obtained from the download section of the UK showcase website.
The ukbconv utility, downloadable similarly, can be used to produce lists of all the Bulk data-files included in an application.
ukbfetch -eperson_id -ddataset_name -bbatch_file [-aauthentication_keyfile] [-v]where the flags are as follows:
-a | Specifies the authentication keyfile containing application ID and truncated password. This is an optional flag and is not required if the default authentication file name (.ukbkey) has been used. |
---|---|
-b | Specifies a batch file containing participant-ID and data-file ID pairs, for retrieving multiple data-files at once. Details on creating a batch file are given in Section 3 below. |
-d | Specifies a single data-file ID to be retrieved. |
-e | Specifies a single paticipant ID to be retrieved. |
-h | Shows a basic help message. |
-m | Specifies that only the first N data-files listin a batch-file should be retrieved (entered as -mN, e.g. -m20). |
-o | Specifies an alternate name for the output logfile. |
-s | Specifies line N> as the starting point for retrieving data-files listed in a batch file (entered as -sN, e.g. -s50) |
-v | Specifies that output should be verbose (useful for tracing errors). |
Either both -d and -e must be present, or -b alone must be present.
As an example, suppose the authentication keyfile
.ukbkey exists, then to retrieve datafile
6025_1_0 for person 829423 enter the following:
ukbfetch -e829423 -d6025_1_0which will create the file 829423_6025_1_0.typ on the local disk, where typ is an extension appropriate to the type of file. On failure the program will output an error message.
Note that ukbfetch will exit if it attempts to download a datafile to a disk location with insufficient space. Once access has been granted, individual files may be downloaded an unlimited number of times, so we suggest that researchers delete each file once they have finished analysing it.
829423 6025_0_0 829582 6025_1_1 829582 21012_0_2 |
then entering
ukbfetch -binput.txtwould instruct ukbfetch to retrieve the three datafiles listed. The names of the files retrieved will be saved as a list in the file fetched.lis which can be used as an input file to produce worklists for analysis programs (successive runs will over-write this file, so use the -o option to specify alternative names if you wish to keep the files).
Lines beginning with # in a batch file will be ignored - this may be used to embed comments.
Note that no more than 50,000 files can be retrieved on a single run of ukbfetch, however researchers may run multiple instances of ukbfetch simultaneously using different input files.
To facilitate producing lists of datafiles, the ukbconv utility has an option to output "bulk" format which can be loaded directly by ukbfetch. This produces a single file containing all the datafile names - if this exceeds 50,000 lines then the -s and -n flags will need to be used with ukbfetch.
Please be aware that, because the files are encrypted in transit (and decrypted only on receipt), receiving multiple streams of them simultaneously may stress your local system and actually result in decreased throughput overall.
To generate a list of datafiles (to be fetched) for field 145, enter the command
ukbconv ukb789.enc_ukb bulk -s145which will output the file ukb789.bulk. To fetch the datafiles listed in ukb789.bulk, create a .ukbkey file containing
789 c3d4a1b2c3d4a1b2c3d4a1b2c3d4a1b2c3d4a1b2c3d4a1b2c3d4a1b2c3d4a1b2c3d4 |
and enter the command
ukbfetch -bukb789.bulkwhich will connect to the repository and download copies of the information. The names of the successfully fetched datafiles will be outputted to in the logfile fetched.lis.
Note: If the ukb789.bulk file contained 2300 lines (i.e. more than 1000), then the data could be retrieved using the following set of commands
ukbfetch -bukb789.bulk -s1 -n800 -of1 ukbfetch -bukb789.bulk -s801 -n800 -of2 ukbfetch -bukb789.bulk -s1601 -n700 -of3The end result would be 2300 datafiles (assuming sufficient disk-space) in the current directory, accompanied by the output logfiles f1.lis, f2.lis and f3.lis.