Accessing Returned Data within UK Biobank
When an Application completes the reseachers involved are obliged to send a summary of their results and any
derived measures back to UK Biobank for distribution to other projects. The
ukblink
client has been developed to
allow Approved researchers to download these Returned Datasets and to link their individually pseudonymised
contents to the researchers own current Application (known here as 'bridging'). This guide explains how
to use the ukblink utility, which
can be obtained from the download
section of the UK showcase website.
To acquire Returned Datasets from the UK Biobank secure online repository services a researcher must
- be a validated UK Biobank researcher;
- be part of an Approved Application;
- have been issued a standard dataset together with the associated password credentials;
- have included the desired Returned Datasets in an approved Basket.
This webpage details the means by which Returned Data held by UK
Biobank can be accessed and manipulated once this access has been approved.
- Preparation
- Notices
- Authentication
- Fetching data
1. Preparation
Following approval of a research application, researchers will be sent
a 32-character MD5 Checksum and a 64-character password. The next step is to acquire the ukblink utility from the
Downloads
section of the Showcase website.
Some of the UKB utilities are supplied pre-compiled for both MS-Windows and
Linux systems. The MS-Windows utilities have the suffix
.exe however the
explanations given in this guide
omit this for generality. All the utility programs are command-line, so Windows
versions are best run from a Command Prompt window, and Linux versions are best run directly from a Terminal.
The repository consists of a pair of mirrored systems each connected to the UK JANET network
by independent links. The system names are:
- biota.ndph.ox.ac.uk
- chest.ndph.ox.ac.uk
To access bulk data from a remote computer the system that the download utility is running on
must be able to make http (Port 80) connections to at least one, and preferably both, of the repository systems.
If this is not possible then researchers should contact their local IT team to resolve the issue.
2. Notices
Before downloading any data, Researchers are reminded that:
- All access attempts, whether successful or denied, are logged and
monitored with the IP address recorded.
- UK Biobank
does not backup user generated data and researchers are responsible for
ensuring the safety/integrity of information they produce.
- Any data exported outside of the UK Biobank systems must be protected
by strong (e.g. AES256) encryption when not actively in use.
-
The volume of data available
in the repository is subject to gradual change and may not match
the list supplied when an application is processed. These changes are
due to participant withdrawals (which require the removal of data) and
the incremental addition of new data for continuing participants.
It is important to be aware that participants occasionally withdraw from
UK Biobank so there may be elements within a Returned Dataset that
relate to people who are no longer part of the study. The mapping file
will not contain identifiers for such people so any obsolete individual
information relating to them is nullified.
Note also that while it is possible to run multiple downloads in
parallel, to provide fair usage the system will not permit a single
Application to run more than 10 simultaneously.
3. Authentication
To access the repository it is necessary to prove ones identity to the system using a keyfile.
See Resource
667 for detailed information on this.
4. Fetching Data
Returned datasets are identified by ID numbers (given on the UKB Showcase) and originate with specific Applications.
In order to make use of such information a researcher must download both the returned-dataset itself and
bridging-file linking the participant-IDs in the returned data to those of the current Application.
4.1 Using ukblink
The ukblink utility can be used with various flags to retrieve either
returned datasets or bridging files.
single or multiple Bulk data-files.
ukblink command is:
ukblink [-bapp_id] [-rdataset_id] [-aauthentication_keyfile] [-v] [-h]
where the flags are as follows:
-a | Specifies the authentication
keyfile containing application ID and truncated password.
This is an optional flag and is not required if the default authentication
file name (.ukbkey) has been used.
|
-b | Specifies the Application ID corresponding to the
Returned dataset to which a bridging file is to be created. This
Application ID can be found on the relevant Returns page on the
Showcase. Note than an error will be generated if the Returned dataset
does not contain individual level information.
|
-r | Specifies the Returned-dataset ID to fetch. |
-h | Shows a basic help message. |
-v | Specifies that output should be verbose (useful
for tracing errors). |
Either -b and -r must be present but not both of them.
As an example, suppose the authentication keyfile
.ukbkey exists, then to retrieve Returned Dataset 1234
enter the following:
ukblink -r1234
which will create the file
ukbreturn1234.typ
on the local disk
where typ is an extension appropriate to the type of file. On failure
the program will output an error message.
As a further example, suppose the authentication keyfile is called
mykey.ukb and the user wishes to link identifiers between the current
Application (as identified in the keyfile, say 567) and
Application 890, then they would enter the following:
ukblink -amykey.ukb -b890
which will create the file
ukb567bridge890.txt. This file contains 2 columnm, with
each row listing the identifier for a particular participant using the
schemes for Apps 567 and 890.
END