Accessing Complex Data within UK Biobank

Much of the data held by UK Biobank cannot easily be represented in a simple tabular form. This complex-format data is held within a secure online repository (outside the main UK Biobank showcase system) where elements of it may be downloaded on a piecemeal basis as required.

To use the UK Biobank secure online repository services you must

  1. be a validated UK Biobank researcher;
  2. be part of an Approved Application; and
  3. have been issued a standard dataset
  4. together with the associated password credentials.

This webpage details the means by which Bulk and Record data held by UK Biobank can be accessed and manipulated once access has been approved.

  1. Introduction
  2. Notices
  3. Connectivity
  4. Authentication
  5. Analysis Services
  6. Bulk Data
  7. Record Data
  8. Genetic Data

1. Introduction

This guide is intended for Researchers who have had an Application for access to UK Biobank approved and have been given a 32-character MD5 Checksum and a 64-character password. It assumes that readers: in addition, anyone using the UKB-provided analysis servers, must also

Various utilities are supplied pre-compiled for both MS-Windows and Linux systems. The MS-Windows utilities have the suffix .exe however the explanations given in this guide omit this for generality.

All programs are command-line, so the Windows versions are best run from a Command Prompt window, and the Linux versions are best run directly from a Terminal.

2. Notices

Researchers are reminded that: With respect to the last point, using whole-disk encryption at the file-system level would be the ideal solution, however if this is not achievable then downloaded data should be re-encrypted when it is not being used.

Researchers should also be aware that the volume of data available in the repository is subject to gradual change and may not match the list supplied when an application is processed. These changes are due to participant withdrawals (which require the removal of data) and the incremental addition of new data for continuing participants.

3. Connectivity

The repository consists of a pair of mirrored systems each connected to the UK JANET network by a separate 1Gb/s link. The system names are: To access Complex Data your computer must be able to make http (Port 80) connections to at least one, and preferably both, of these systems.

These servers do not hold any identifiable data such as participant names or addresses.

4. Authentication

To access the repository it is necessary to prove your identity to the system. This is done by supplying an Authentication File which contains your Application ID together with the first 24-characters of the 64-character decryption Password (provided to you via email).

The authentication file can be created in a standard text editor, for instance MS-Notepad on a Windows system or nano/emacs on a Linux system. It should contain two lines of text, the first containing the Application ID and the second the truncated password. For example if the Application ID is 123 and the decryption key is a1b2c3d4a1b2c3d4a1b2c3d4e5f6a44b343d334eef232ce3d3298ba847d2983c3cc23490 then the file contents would be:

123
a1b2c3d4a1b2c3d4a1b2c3d4

The authentication file should be named ".ukbkey" and stored in the home directory of your account on the system (C:\ for Windows users).

5. Analysis Services

UK Biobank does not itself provide analysis facilities, however researchers wishing to use supercomputer-level facilities may be able to rent time on the servers at ARC, which are co-located alongside the UK Biobank systems to minimise bandwidth constraints. (As ARC is a UK University facility not all projects are eligible to use it).

6. Bulk Data

The Bulk data section of the UKB repository contains large complex data items which are each a complex/compound dataset in themselves. Detailed instructions for accessing Bulk data are given in UKB Resource 664.

7. Record Data

The Record data section of the repository contains plain information which is too large or highly structured to be supplied as part a standard dataset for download. Detailed instructions for accessing Record data are given in record.html.

8. Genetic Data

The Genetic data section of the repository contains genotypes and associated information. Although usually simple in format, the entire genetic dataset is tens of Terabytes in size and thus made available here for piecemeal download. Detailed instructions for accessing Genetic data are given in UKB Resource 664.

Improving the health of future generations