Accessing Complex Data within UK BiobankMuch of the data held by UK Biobank cannot easily be represented in a simple tabular form. This complex-format data is held within a secure online repository (outside the main UK Biobank showcase system) where elements of it may be downloaded on a piecemeal basis as required.
To use the UK Biobank secure online repository services you must
- be a validated UK Biobank researcher;
- be part of an Approved Application; and
- have been issued a standard dataset together with the associated password credentials.
This webpage details the means by which Bulk and Record data held by UK Biobank can be accessed and manipulated once access has been approved.
- Analysis Services
- Bulk Data
- Record Data
- Genetic Data
1. IntroductionThis guide is intended for Researchers who have had an Application for access to UK Biobank approved and have been given a 32-character MD5 Checksum and a 64-character password. It assumes that readers:
- have downloaded a standard dataset;
- are familiar with the decryption and format conversion utilities outlined in the Guide to Using UK Biobank Data;
- have been issued with login details (username and password) for the repository.
Various utilities are supplied pre-compiled for both MS-Windows and Linux systems. The MS-Windows utilities have the suffix .exe however the explanations given in this guide omit this for generality.
2. NoticesResearchers are reminded that:
- UK Biobank does not backup user generated data and researchers are responsible for ensuring the safety/integrity of information they produce.
- Researchers are responsible for ensuring they have all licenses required for any proprietary analysis software they install on the UK Biobank or ARC systems allocated to them.
- Any data exported outside of the UK Biobank systems must be protected by strong (e.g. AES256) encryption when not actively in use.
Researchers should also be aware that the volume of data available in the repository is subject to gradual change and may not match the list supplied when an application is processed. These changes are due to participant withdrawals (which require the removal of data) and the incremental addition of new data for continuing participants.
3. ConnectivityThe repository consists of a pair of mirrored systems each connected to the UK JANET network by a separate 1Gb/s link. The system names are:
4. AuthenticationTo access the repository it is necessary to prove your identity to the system. This is done by supplying an Authentication File which contains your Application ID together with the the 64-character decryption Password (provided to you via email).
The authentication file can be created in a standard text editor, for instance MS-Notepad on a Windows system or nano/emacs on a Linux system. It should contain two lines of text, the first containing the Application ID and the second the password. For example if the Application ID is 123 and the decryption key is a1b2c3d4a1b2c3d4a1b2c3d4e5f6a44b343d334eef232ce3d3298ba847d2983c3cc23490 then the file contents would be: