Appendix B: Quarterly progress report on improvements to health datasets
Updated 3 September 2021
Background
The Minister for Equalities’ second quarterly report on COVID-19 health disparities recommended that NHS England and NHS Improvement (NHSEI) – working with the Department of Health and Social Care (DHSC), Public Health England (PHE) and others – provide quarterly progress updates outlining improvements to health datasets.
This document represents the first of these updates, and focuses on a new method of assigning ethnicity using Hospital Episode Statistics (HES). The new method, developed by PHE, is based on the NHS Digital HES ethnicity index with a few modifications.
The next update on improvements to health datasets will appear with the fourth quarterly report.
Assigning ethnicity from HES
Analyses split by ethnic groups are needed to assess overall inequalities in the population, and more recently for measuring inequalities in health outcomes due to COVID-19. However, ethnicity is not recorded in many health-related datasets.
In order to generate breakdowns in data by ethnic group, analysts within PHE:
- link datasets of interest to HES
- attach ethnicity from HES to their dataset if it does not already have ethnicity recorded
Patients may report different ethnicities in different episodes of care – for example, as an inpatient, as an outpatient, or during a visit to A&E – so a method of choosing which ethnicity to take is required. During the COVID-19 pandemic, it has become evident that the original method of assigning ethnicity has overestimated the number of people in the ‘Other’ ethnic group, so alternative methods of assigning ethnicity from HES were investigated.
The alternative methods were discussed with stakeholders in PHE, as well as external stakeholders from:
- the Office for National Statistics
- the Race Disparity Unit
- NHS Digital
- The King’s Fund
- the Institute of Health Equity
An alternative method of assigning ethnicity was agreed. This document shows the original method for ethnicity, as well the agreed alternative method.
Original method: using the most recent usable ethnic code
The original method used within PHE looked at the most recent usable ethnic code for an individual (see Appendix B.1 for further details about ethnic codes) available in these datasets in this order:
- HES APC (admitted patient care) – from 1997 to 1998 onwards
- HES OP (outpatients) – from 2003 to 2004 onwards
- HES AE (accident and emergency) – from 2003 to 2004 onwards
- If a usable ethnic code still hadn’t been found, the person would not have a usable ethnic code recorded. Instead, they would have their most recent unusable code recorded (unknown, not stated)
For recent analyses in PHE, during the COVID-19 pandemic, the Secondary Uses Service (SUS) dataset has also been used to assign ethnicity. This is given top priority, followed by the data sets listed above.
Appendix B.2 shows the age-standardised mortality rates for deaths from all causes, and deaths mentioning COVID-19 between 21 March and 1 May 2020, compared with baseline mortality rates (2014 to 2018), by ethnicity and sex for England.
These charts show that, even in the baseline period, the age-standardised mortality rates for the ‘Other’ ethnic group were unrealistic. For the baseline period, the ‘Other’ ethnic group had an age-standardised mortality rate of 2,792 – all other ethnic groups had rates less than 1,000.
These results led PHE to consider alternative methods of assigning ethnicity. The method shown in this document has been agreed as the best option for PHE to take.
New method: using the most frequent ethnicity recorded
This method uses the most frequent ethnicity recorded across the 3 HES data sets used in the original method, excluding any unknown values. Outpatient (OP) data was not used from 2006/07 through to 2009/10 as no ethnic code entries were recorded in those years because of a technical issue. Admitted patient care (APC) data is restricted to 2003/04 onwards, as the quality and completeness of admitted patient care data was lower before then.
If there are multiple ethnicities in the data sets with the same frequency, the most recent is chosen.
If there are multiple ethnicities with the same frequency and latest date, precedence is given to the most recent value from the APC data set, followed by the accident and emergency (AE) data set, and the OP data set. Checks completed by NHS Digital indicate completeness in the AE data set was greater than the OP data set.
If there are multiple ethnicities with the same frequency, latest date and source of data, we would select the ethnicity that occurs more frequently in the general population of England and Wales, according to the 2011 Census. (See Appendix B.3).
To put into context, the 2011 Census indicated that 80.5% of the population were White British. If a person has multiple ethnicities recorded (such as White British and White Irish) with the same frequency, the same latest date and the same source, precedence would be given to the White British ethnicity, as more of the population are in that ethnic group, compared with the White Irish. It should be noted that incidences of this are very small, and this step was introduced in order to automate the process and to receive the exact same result each time the analysis is completed.
A value of ethnicity unknown will only be present if there are no known ethnicities in any of the HES data sets.
To take into account the overrepresentation of the ‘Other’ ethnic group, if the most common ethnic group assigned by the method above is ‘Other’, the second most common usable ethnic group is assigned instead. If there are no other usable ethnic groups, the person is still assigned to the ‘Other’ ethnic group.
It is perfectly valid for patients to decide not to state their ethnicity when this information is collected in hospital data. People may also decide to state their ethnicity on some occasions but not others. The original and alternative methods used for assigning ethnicity do not select ‘Not Stated’ records if there are alternative ethnic codes available. Only those who do not have a usable ethnic code and have repeatedly not stated their ethnicity will have the ethnicity ‘Not Stated’ recorded.
Appendix B.1 – Ethnic codes
Code (2001/02 onwards) | Description (2001/02 onwards) | Code (1995/96 to 2000/01) | Description (1995/96 to 2000/01) | Usable or not usable ethnic code |
---|---|---|---|---|
A | British (White) | 0 | White | Usable |
B | Irish (White) | 0 | White | Usable |
C | Any other White background | 0 | White | Usable |
D | White and Black Caribbean (Mixed) | Usable | ||
E | White and Black African (Mixed) | Usable | ||
F | White and Asian (Mixed) | Usable | ||
G | Any other Mixed background | Usable | ||
H | Indian (Asian or Asian British) | 4 | Indian | Usable |
J | Pakistani (Asian or Asian British) | 5 | Pakistani | Usable |
K | Bangladeshi (Asian or Asian British) | 6 | Bangladeshi | Usable |
L | Any other Asian background | Usable | ||
M | Caribbean (Black or Black British) | 1 | Black - Caribbean | Usable |
N | African (Black or Black British) | 2 | Black – African | Usable |
P | Any other Black background | 3 | Black – Other | Usable |
R | Chinese (Other ethnic group) | 7 | Chinese | Usable |
S | Any other ethnic group | 8 | Any other ethnic group | Usable |
Z | Not stated | 9 | Not given | Not usable |
X | Not known (prior to 2013) | 99 | Not known | Not usable |
99 | Not known (2013 onwards) | 99 | Not known | Not usable |
Appendix B.2: Age-standardised mortality rates
Figure 1: Age-standardised mortality rates among men for all deaths, and deaths mentioning COVID-19 (21 March to 1 May 2020), compared with baseline mortality rates (2014 to 2018), by ethnicity (England)
Figure 2: Age-standardised mortality rates among women for all deaths, and deaths mentioning COVID-19 (21 March to 1 May 2020), compared with baseline mortality rates (2014 to 2018), by ethnicity (England)
Appendix B.3: Population of England and Wales, by ethnicity (Census 2011)
Ethnicity | Ethnic code | Percentage | Order |
---|---|---|---|
White British | A | 80.5% | 1 |
White Other (including Gypsy and Traveller) | C | 4.5% | 2 |
Indian | H | 2.5% | 3 |
Pakistani | J | 2.0% | 4 |
Black African | N | 1.8% | 5 |
Asian Other | L | 1.5% | 6 |
Black Caribbean | M | 1.1% | 7 |
White Irish | B | 0.9% | 8 |
Bangladeshi | K | 0.8% | 9 |
Mixed White and Black Caribbean | D | 0.8% | 10 |
Chinese | R | 0.7% | 11 |
Mixed White and Asian | F | 0.6% | 12 |
Mixed Other | G | 0.5% | 13 |
Black Other | P | 0.5% | 14 |
Mixed White and Black African | E | 0.3% | 15 |
Other | S | 1.0% | 16 |