A large proportion of the data data-fields within the UK Biobank repository are categorical. A data-coding is a mapping between the actual data and the values used to represent it within the database - for example the value "44" may be stored to represent "Born in Great Britain", with "33" representing "Born in France".
There are two structures
- Flat - the data-coding is a simple list of values with no ordering or relationship between them.
- Tree - the data-coding is a set of values which represent some sort of tree hierarchy, allowing an answer to be specified to whatever level of detail is known.
Apart from their use in interpreting the values of a data-field, a data-coding also shows the range of answers/alternatives that were available when an item of data was entered - the data-coding may include alternatives that do not appear in the final dataset because they were not applicable to any participants in UK Biobank.