Frequently asked questions (FAQ)
- What do the column names in my dataset mean?
- How I can access and open my additional documents?
- What does the ROSETTA Error: member "eXXX" not found message mean?
- Where can I get an MD5 checksum for the genetic files downloaded with the ukbgene program?
- My connection to ukbfetch or ukbgene is blocked by a firewall
What do the column names in my dataset mean?
In the standard flat-file dataset that researchers receive, all columns names will be coded following a format of X.Y.Z.
X is the data-field number. For example, any column labelled 53.Y.Z will refer to data-field 53, Date of attending assessment centre.
Y is the instance index. For many data-fields, distinct values are available for each participant at different points in time. Often, instance 0 will refer to the baseline assessment, instance 1 to the repeat assessment, and instance 2 to the imaging clinic. For online questionnaires, the instance index will refer to each wave of questionnaire invitations.
The meaning of each instance can be found in the "Instances" tab on the webpage describing each data-field. For example, the column labelled 53.0.0 will refer to the date of attending the baseline assessment centre, while 53.1.0 will refer to the date of attending the repeat assessment (only available for 20,000 participants).
Z is the array index. For some data-fields, distinct values are available for each participant at the same point in time. For example, this will be the case for multiple-choice questions (where multiple values will be stored in each array index), or physical measures that were repeated (such as blood pressure).
As a general example, for data-field 6138 (Qualifications):
- 6138.0.0 holds the 1st value entered at baseline assessment
- 6138.0.1 holds the 2nd value entered at baseline assessment, if any
- 6138.0.2 holds the 3rd value entered at baseline assessment, if any
- 6138.1.0 holds the 1st value entered at repeat assessment
- 6138.1.1 holds the 2nd value entered at repeat assessment, if any
- 6138.1.2 holds the 3rd value entered at repeat assessment, if any
- 6138.2.0 holds the 1st value entered at the imaging clinic
- 6138.2.1 holds the 2nd value entered at the imaging clinic, if any
- 6138.2.2 holds the 3rd value entered at the imaging clinic, if any
How I can access and open my additional documents?
The UK Biobank Access Team sometimes sends additional documents to researchers. These could be bridging files to match participants from two different applications, case-control selections, sample manifest files, etc.
To use these files, the following steps should be followed:
1. Download the file in a folder where the ukb_unpack utility has been previously downloaded.
2. Using a command-line terminal, unpack the file in the same way as the standard dataset unpacking. For example, if the downloaded file is named ukbXXXX.enc, type the following command:
ukb_unpack.exe ukbXXXX.enc encryptionkey
3. After step 2 is finished, you should have a new file named ukbXXXX.enc_ukb. At this point, unlike in the standard dataset process, there is no need to use the ukb_conv utility. All you need to do is rename the ukbXXXX.enc_ukb to ukbXXXX.newextension, where newextension is the file type that was given to you by the UK Biobank Access Team.
In most cases, the newextension should be csv.
What does the ROSETTA Error: member "eXXX" not found message mean?
While using the ukb_conv utility, some researchers, depending on the variables in their dataset, may see the following error message appear in the command-line terminal:
Rosetta error: ROSETTA Error: member "eXXX" not found
Validity error: ROSETTA Error: member "eXXX" not found
(XXX can be any integer)
This bug is being investigated at the moment, but this message does not affect the conversion process in any way, and has no consequence on the data being extracted. Researchers can directly open the files generated by ukb_conv, without worrying about these errors.
Where can I get an MD5 checksum for the genetic files downloaded with the ukbgene program?
The UKB downloads are individually customised for each user so there isn't a single MD5 for all datasets. However, the ukbgene program internally verifies each download as it is received.
My connection to ukbfetch or ukbgene is blocked by a firewall
If a download of bulk data (using ukbfetch) or genetic data (using ukbgene) fails because the connection is blocked, this will most probably be due to an internal firewall at the researcher's institution. To use these programs successfully, researchers should make sure that access is authorised for biota.osc.ox.ac.uk, cask.ndph.ox.ac.uk and chest.ndph.ox.ac.uk on ports 80 and 443.