Essential information > Understanding UK Biobank > Data providers and dates of data availability
Data providers and dates of data availability
This page gives details of the various data providers for linked death, hospital inpatient, cancer and primary care (GP) records, and COVID-19 test results. It also gives the period for which each type of data is available, giving the earliest date for which we have data and an attempt to determine a censoring date for each type of data and data provider as described in the next sections.
Censoring dates
The censoring date for each data provider is the date up to when UK Biobank estimate that data received from that provider is mostly complete. Our general rule for estimating an appropriate censoring date is as follows:
The censoring date is the last day of the month for which the number of records is greater than 90% of the mean of the number of records for the previous three months, except where the data for that month is known to be incomplete in which case the censoring date is the last day of the previous month.
The censoring dates are not applied by UK Biobank to the data made available to researchers which will always contain the latest data regardless of censoring dates, and may include incomplete data after the dates below. These dates are intended for guidance only. Once a researcher has received their data, they should censor outcomes based on their own research protocol.
The Data Portal vs Showcase fields
Hospital inpatient data and death data are available to researchers in two formats: record-level fields on the Data Portal and Showcase fields, which in the case of inpatient data give summary-level information about the hospital episodes. In addition, there are Showcase fields such as the Algorithmically-defined outcomes (Category 42) and the First Occurrences (Category 1712) which combine information from multiple sources (such as self-report at Biobank assessment centres, inpatient, death, and GP records).
Data currently available
Death data
Death | Data Provider | International Classification of Diseases (ICD) | Period of data currently available | Censoring date | |
ICD9 | ICD10 | ||||
England & Wales | NHS England | 2006 onwards | April 2006 onwards | 30 November 2023 | |
Scotland | NHS Central Register, National Records of Scotland | 2006 onwards | April 2006 onwards | 31 December 2023 |
Hospital inpatient data
Hospital Admissions (Inpatients) | Data Provider | International Classification of Diseases (ICD) | Classification of Interventions and Procedures (OPCS) | Period of data currently available | Censoring date | ||
ICD9 | ICD10 | OPCS3 | OPCS4 | ||||
Hospital Episode Statistics for England (HES) | NHS England | 1997 onwards | 1997 onwards | 1997 onwards, with critical care data from 2011 | 31 October 2022 | ||
Scottish Morbidity Record (SMR) | Information and Statistics Division (ISD), Scotland | 1981 - 1996 | 1996 onwards | 1977 - 1988 | 1989 onwards | 1981 onwards | 31 August 2022 * |
Patient Episode Database for Wales (PEDW) | Secure Anonymised Information Linkage (SAIL), Wales | 1999 onwards | 1999 onwards | 1991 onwards | 31 May 2022 | ||
*The Scottish hospital inpatient data does not currently include maternity admissions. The Showcase summary fields contain Scottish data for the period August to September 2021 (as well as a very small number of other isolated records) that do not yet appear on the Data Portal and the RAP; these records will be updated on the Data Portal and RAP at the next Showcase release.
Notes on the English & Scottish inpatient data:
- We have held back a very small proportion of English inpatient data for April 2017 onwards (approximately 0.25%, or around 600 episodes per year) due to an incomplete linkage match. After they have been scrutinised further, some of these records may be released at a future date.
- During the financial year from Apr 2020 to Apr 2021, UK Biobank received monthly extracts of hospital inpatient data for England, with each file containing the whole financial year to that point. Whilst we removed exact duplicates of previous episodes, we noted occasions where a second record exists which differs slightly from a previous episode despite clearly being a record of the same hospital episode. Currently, both versions of this same episode will appear in our released data.
- Similarly each extract currently received from Scotland consists of all records from 01/04/2019 onwards. Again, exact duplicates are removed, but each extract contains low numbers of records going back over a year, and we have not made any attempts as yet to determine if these are genuinely new admissions, or alterations to previous admission records.
- A separate censoring date is not provided for Scottish Psychiatry data (SMR04), due to the small number of episodes recorded each month. The psychiatry data is received alongside the main Scottish admissions data (SMR01) and is expected to be complete up to the same date.
Primary care (GP) data - for all research
Note that the GP data available for all research covers a total of approximately 45% of the UK Biobank cohort. The coverage dates are based on the value of the field event_dt (event date) in the gp_clinical table.
GP dataset | Data provider | Participant coverage (approx.) | Coding systems | Period of data currently available | Censoring date |
England | TPP | 165,000 | See Resource 591 | 1938 onwards * | 31 May 2016 |
Vision | 18,000 | See Resource 591 | 1940 onwards * | 31 May 2017 | |
Scotland | Vision/EMIS | 27,000 | See Resource 591 | 1939 onwards * | 31 March 2017 |
Wales | Vision/EMIS | 21,000 | See Resource 591 | 1948 onwards * | 31 Aug 2017 |
* For each provider the number of records per year is very low initially and gradually increases.
Primary care (GP) data - for COVID-19 research
GP data for COVID-19 research were made available to approved researchers between 2020 and 2021, which covered the majority of the UKB cohort (~450,000 participants) and the period 1938 to 31st August 2021 (see Resource 3151). These data are no longer available since the withdrawal of the UK Government COPI (Control of Patient Information) notice on 1st July 2022.
Cancer data
Cancer | Data Provider | International Classification of Diseases (ICD) | Period of data currently available | Censoring date | |
ICD9 | ICD10 | ||||
England | NHS England | 1979 - 1994 | 1995 onwards | 1971 onwards | 31 December 2020 |
Wales | NHS England | 1979 - 1994 | 1995 onwards | 1971 onwards | 31 December 2016* |
Scotland | National Records of Scotland, NHS Central Register | 1980 - 1996 | 1997 onwards | 1957 onwards | 30 November 2021 |
* Welsh cancer registry data was originally provided by NHS England; however in mid-2023 it was discovered that Welsh data had ceased to be included in NHS England cancer registry extracts from 2017 onwards. The censoring date was therefore revised back to January 2017 for Welsh cancer records, and we are currently investigating alternative sources for this data.
COVID-19 test results
COVID-19 test | Data provider | Coding systems | Period of data currently available |
England | Public Health England | See data dictionary | Early 2020 - September 2022 * |
Scotland | Public Health Scotland | See data dictionary | Early 2020 - November 2022 * |
Wales | SAIL | See data dictionary | Early 2020 - December 2022 * |
* Given changes in testing levels, the standard definition for censoring dates is no longer applicable for COVID-19 testing data. The dates given refer to the full range of data available, and should not be used to infer completeness.
COVID-19 vaccination data
COVID-19 vaccinations | Data provider | Coding systems | Period of data currently available |
England | NHS England | See Resource 2910 | May 2020 - June 2023 * |
* The sporadic nature of vaccination appointments leads to date clustering at specific times of year. This means that the standard definition for censoring dates is not applicable for COVID-19 vaccination data. The dates given refer to the full range of data available, and should not be used to infer completeness.