Official Statistics

Quality and methodology information (QMI) for Tuberculosis in England National reports

Updated 31 October 2024

Applies to England

About this report

This report outlines the quality and methodology information (QMI) relevant to the ‘Tuberculosis in England: national quarterly reports’ official statistics release published by the UK Health Security Agency (UKHSA). This QMI report supports users in understanding the strengths and limitations of these statistics, ensuring UKHSA is compliant with the quality standards stated in the Code of Practice for Statistics. The report covers the following areas:

  1. The strengths and limitations of the data used to produce the statistics.
  2. The methods used to produce the statistics.
  3. The quality of the statistical outputs.

About the statistics

Tuberculosis (TB) is an infectious disease caused by bacteria of the Mycobacterium tuberculosis (M. tuberculosis) complex. It is predominantly spread by the respiratory route; people with infection in their lungs breathe out infectious bacteria, which may then be inhaled by others. TB is a treatable disease with a combination of multiple antibiotics of normally at least 6 months duration and up to 24 or 36 months in those with multi drug resistant TB (MDR TB) or in those with complex disease. TB is a notifiable disease, meaning that clinicians have a statutory duty to notify local authorities or a local UKHSA centre of suspected cases.

The national quarterly report of tuberculosis in England presents people with TB disease notified to TB surveillance systems in England.

The data in the statistics is provisional and is subject to revision.

Geographical coverage: England

Publication frequency: Quarterly

Changelog

27 July 2023: QMI report first published

Contact

Lead analyst: Sharon Cox

Contact information: TBSection@ukhsa.gov.uk

Suitable data sources

Statistics should be based on the most appropriate data to meet intended uses.

This section describes the data used to produce the statistics.

Data sources

The quarterly report uses data from the National Tuberculosis Surveillance System (NTBS), which is a live, user-entered, database. It was launched in 2021 and replaced 2 historical surveillance systems: the Enhanced Tuberculosis Surveillance System (ETS) and the London TB Register (LTBR). Data sets from 2018 onwards were extracted from ETS and LTBR and were migrated into NTBS between July and December 2021, with all users utilising NTBS by December 2021.

TB is a notifiable disease; therefore, any TB diagnosis must be entered onto the database. Clinical teams at the TB service level provide information on TB cases either directly through the web-based system entered at the clinic, or in Northern Ireland, case report forms are entered to the system by the health protection team. The data includes notification details, demographic information, social risk factors, clinical and microbiological information.

All new TB cases that meet one of the 2 following case definitions must be notified:

  1. Culture confirmed cases due to M. tuberculosis complex (including M. tuberculosis, M. bovis, M. africanum or M. microti). In other words, a sample is taken from the patient, tested, and confirmed to be TB.
  2. In the absence of culture confirmation, cases that meet the following criteria:
  • a clinician’s judgement that the patient’s clinical and/or radiological signs and/or symptoms are compatible with tuberculosis
  • a clinician’s decision to treat the patient with a full course of anti-TB therapy

NTBS receives a feed of reference laboratory results including patient identifiers, which are matched daily to notifications using a probabilistic matching algorithm. Matches below the threshold are notified to users with appropriate permissions. Users can then manually match or un-match from the provided list of potential matches. Additional cleaning and review are periodically carried out to identify duplicate records, inconsistencies and missing data by the national team and regional field service teams who coordinate responses from the TB services responsible for entering the information.

NTBS also contains information on TB treatment. Users enter the date treatment was started and the date treatment was completed or other TB treatment outcomes including loss to follow-up, treatment stopped or if the person died. NTBS sends out automated reminders for users to enter treatment outcomes at 12, 24 and 36 months and outcomes are then grouped into these blocks of duration. Users can also record outcomes of still on treatment or not evaluated at these time points if none of the other outcomes were previously applicable.

Data quality

The data that we use to produce statistics must be fit for purpose. Poor quality data can cause errors and can hinder effective decision making.

We have assessed the quality of the source data against the data quality dimensions in the Government Data Quality Framework.

This assessment covers the quality of the data that was used to produce the statistics, not the quality of the final statistical outputs. The quality summary section below assesses the quality of the final statistical outputs.

Strengths and limitations of the NTBS data

The following strengths and limitations of the data have been identified:

  • reporting of TB cases is mandatory, so the NTBS data is a comprehensive record of TB notifications in England
  • NTBS is a live system and notifications are available to the TB team as soon as they are entered
  • cleaning and review are regularly carried out on the data, probabilistic matching helps ensure accuracy, and validation rules mean that essential fields must be completed properly
  • users cannot submit incomplete notifications, meaning that all the required information is collected for each notification

The following limitations of the data have been identified:

  • counts of people with TB by UKHSA centre are reported by UKHSA centre of residence – this can differ to where people are diagnosed and treated
  • NTBS does not include people diagnosed or managed with TB in Scotland, therefore some people who are normally resident in England, but diagnosed or managed in Scotland, will not appear in the data

NTBS is the most appropriate source of data for the statistics. TB is a notifiable disease, which means that NTBS holds a comprehensive record of TB cases. The design of NTBS helps ensure that the data is accurate and valid.

Accuracy

Accuracy is about the degree to which the data reflects the real world. This can refer to correct names, addresses or represent factual and up-to-date data.

Notification of TB is required within 3 days of a suspected or confirmed TB diagnosis. In 2021, just over half of notifications (57.2%) were notified within 3 days of diagnosis and 75% within 7 days. These numbers have not notably changed since 2016.

Notifications cannot be deleted, but rather can be de-notified when necessary. A case is usually de-notified if the diagnosis changes, or if the record has been created in error. This helps ensure that records in NTBS are accurate and up to date.

Provisional data (which we also refer to as live data) is all data entered into NTBS after the date of the final extraction date of the current cleaned and validated analytical data set used for the generation of the TB annual report and other outputs. Provisional data will not have had all checks and validation completed by the time of analysis for publication.

Completeness

Completeness describes the degree to which records are present.

For a data set to be complete, all records are included, and the most important data is present in those records. This means that the data set contains all the records that it should and all essential values in a record are populated.

Completeness is not the same as accuracy as a full data set may still have incorrect values.

NTBS contains a number of mandatory fields that must be completed in order to notify a case. The mandatory fields are personal details of patients such as date of birth, sex, ethnic group, postcode and birth country. It is also mandatory to enter at least one site of disease and date of diagnosis. This ensures that all of the necessary information is recorded for each notification.

Where there are significant numbers of missing values compared with what are due to be recorded at that point in time, these are explicitly included in the report. For example, information on social risk factors is not always available to users or people may legitimately refuse to answer certain questions.

Uniqueness

Uniqueness describes the degree to which there is no duplication in records. This means that the data contains only one record for each entity it represents, and each value is stored once.

Some fields, such as National Insurance number, should be unique. Some data is less likely to be unique, for example geographical data such as town of birth.

To create a new notification, NTBS users must first preform a search to check whether the notification has already been recorded. This helps reduce the number of duplicate records.

Consistency

Consistency describes the degree to which values in a data set do not contradict other values representing the same entity. For example, a mother’s date of birth should be before her child’s.

Data is consistent if it doesn’t contradict data in another data set. For example, if the date of birth recorded for the same person in 2 different data sets is the same.

The probabilistic matching algorithm matches laboratory results to NTBS notifications. Cases can only be matched if key fields are consistent across notifications and laboratory results.

The national and regional teams conduct routine checks of data fields for consistency to identify potential errors and return queries to the relevant case managers for resolution to ensure data consistency.

Timeliness

Timeliness describes the degree to which the data is an accurate reflection of the period that it represents, and that the data and its values are up to date.

Some data, such as date of birth, may stay the same whereas some, such as income, may not.

Data is timely if the time lag between collection and availability is appropriate for the intended use.

NTBS is a live database that is managed by the TB team. Hence there is no delay between the data collection and availability.

There may be some considerable delay in reporting of treatment events and manual matching of laboratory results, as the case manager needs to return to NTBS and complete the fields. However, to mitigate this NTBS generates automated reminders, lists of notifications with important missing values and required actions such as reviewing laboratory matches and accepting transfers of people between services at each logon as appropriate for the user level of access.

Validity

Validity describes the degree to which the data is in the range and format expected. For example, date of birth does not exceed the present day and is within a reasonable range.

Valid data is stored in a data set in the appropriate format for that type of data. For example, a date of birth is stored in a date format rather than in plain text.

NTBS prevents users from entering invalid data for most of the mandatory fields: for example, date of birth must be entered in a date format and ethnic group is selected from a drop-down menu. These rules ensure that the data is entered in the correct format.

Sound methods

Statistical outputs should be made using the best available methods and recognised standards.

This section describes how the statistics were produced and quality assured.

Data set production

Data used in the quarterly report comes from 2 sources: data published in the TB annual report and live data from the NTBS for which data cleaning and validation have not been fully completed. For example, the report for quarter 2 of 2023 contains data used in the annual report for notifications in 2021 and live data from all subsequent 4 quarters in 2022.

Treatment outcomes for data that is sourced from the TB annual report is additionally updated with live NTBS data. The live data is extracted using a SQL stored procedure with variables coded to match those in the annual report.

Quality assurance

The quarterly report is produced using R. The production of the figures and the supplementary data tables has been automated. This reduces the risk of human error as users do not have to manually update figures or copy and paste between documents. Quality assurance is done after running the code.

The figures and tables are sense-checked and compared with figures from previous quarterly reports for irregularities by at least 2 members of the team. All of the automated outputs are manually checked in this way. If concerns are raised regarding one figure, further checks are conducted to assess possible errors in the data.

Confidentiality and disclosure control

Personal and confidential data is collected, processed, and used in accordance with the UKHSA Privacy Notice. All UKHSA staff with access to personal or confidential information must complete mandatory information governance training, which must be refreshed every year. Information is stored on computer systems that are kept up-to-date and regularly tested to make sure they are secure and protected from viruses and hacking. UKHSA staff do not store data on their own laptops or computers. Instead, data is stored centrally on UKHSA servers.

No personally identifiable information is included in published data. There are no specific disclosure control methods used, as aggregation of the published figures protects people’s personal data and tables presented cannot be cross tabulated to reveal sufficient information about individuals to pose a meaningful risk of secondary disclosure. The benefits of reporting small numbers in aggregated data are compared with the risk of secondary disclosure on a case-by-case basis. For example, there are relatively few notifications of children with TB but the implications for TB control and management differ by children’s age and therefore small numbers of children by age groups may be reported, but will not be published at a location level that would likely pose a risk of a child being identifiable by combining with other data sources.

Geography

The statistics in this report are published at 2 geographical levels: Country (England), and UKHSA centre.

UKHSA centre is based on an individual’s residential postcode. If the postcode is missing, the UKHSA centre in which treatment occurred is used, for example if a person has no fixed abode.

Most UKHSA centres are consistent with regions. The only difference between regions and UKHSA centres is how they categorise Milton Keynes: Milton Keynes is part of the South East region, but is in the East of England UKHSA centre.

Quality summary

The Code of Practice for Statistics states that quality means that statistics:

  • fit their intended uses
  • are based on appropriate data and methods
  • are not materially misleading

Quality requires skilled professional judgment about collecting, preparing, analysing, and publishing statistics and data in ways that meet the needs of people who want to use the statistics.

This section assesses the statistics against the European Statistical System dimensions of quality.

Relevance

Relevance is the degree to which the statistics meet user needs in both coverage and content.

There is a clear need for timely TB statistics. They provide evidence on progress towards initiatives to control TB in England. In July 2021, UKHSA and NHS England jointly launched the TB action plan for England, 2021 to 2026 to improve the prevention, detection and control of TB in England by defining objectives to help meet the targets in the World Health Organization’s (WHO) end TB strategy and provide monitoring indicators to measure progress.

The statistics are published quarterly. England is a low incidence country for TB, and notifications have steadily fallen since 2011, so there is no pressing need for monthly data. On the other hand, annual data would be too infrequent to be used as a timely indicator. We do also publish an annual report for TB in England, which has more detail than the quarterly reports, but at the cost of timeliness.

The TB statistics are primarily used by people in clinical care, and public health. These users report that they use the statistics for monitoring, strategy and resource allocation, and teaching.

We have continued to make changes to the publication to meet user needs. We now publish 3 products as part of the statistical release:

  1. The main statistics report.
  2. Supplementary data tables, first published in April 2023 as part of the quarter 1 of 2023 publication.
  3. This QMI report, first published in July 2023.

By providing this range of different outputs, we can better cater to the needs of different users from a range of backgrounds, in line with the Office for National Statistics user personas.

Accuracy and reliability

Accuracy is the proximity between an estimate and the unknown true value. Reliability is the closeness of early estimates to subsequent estimated values.

The accuracy of the statistics is largely dependent on the accuracy of the source data. We have assessed the source data to be accurate (see the data quality section) as the design of NTBS helps prevent data entry errors, and guidance given to users helps ensure the right information is collected in the proper format. The statistics report on TB notifications, which are mandatory through NTBS. The statistics therefore represent the whole population of TB notifications in England.

The statistics present provisional data. The data is revised and updated as additional verification, data cleaning, and recoding are completed.

Timeliness and punctuality

Timeliness refers to the time gap between publication and the reference period. Punctuality refers to the gap between planned and actual publication dates.

This report aims to provide timely and up-to-date figures of important epidemiological indicators to inform ongoing TB control efforts in England.

The statistics are always published as soon as possible, allowing for production and quality assurance. Going back to February 2019, when the series was first published, we have always published the statistics 3 to 4 weeks after the end of the quarter that the data refers to. The only exceptions were for quarter 4 of 2021 (published 7 weeks after the end of the quarter), and quarter 4 of 2018 (published 5 weeks after the end of the quarter).

The quarterly reports are official statistics and are pre-announced at least 28 days in advance, in line with the Code of Practice for Statistics. Provisional publication dates for the year ahead are pre-announced online in December and can be found on the UKHSA release calendar.

Accessibility and clarity

Accessibility is the ease with which users can access the data, also reflecting the format in which the data is available and the availability of supporting information. Clarity refers to the quality and sufficiency of the metadata, illustrations and accompanying advice.

We currently publish 3 statistical products as part of this statistical release:

  1. The main statistics report.
  2. Supporting data tables.
  3. This QMI report.

From the quarter 3 of 2022 publication (published October 2022) we started publishing the main statistics report as an HTML web page. The switch to HTML has made the report easier to access across different devices, and the HTML report inherits the accessibility features mentioned in the GOV.UK accessibility statement.

The publication includes visualisations that help explain the data. These are designed to be colour-blind friendly. Each element in a visualisation has a different luminance value. This means that there is always sufficient contrast between elements for them to be distinguished.

We have simplified the commentary in the publication, focusing on plain English, and shortened the publication overall. We also now include main messages in publication to help users understand the statistics.

The supplementary data tables are published in ODS format and follow accessibility guidelines. Each sheet contains only one table. We also do not use nested tables as these do not always work well with screen readers. We avoid using empty cells for the same reason. Each sheet has a descriptive heading, for example, “Number of TB notifications by place of birth and site of disease, England, quarter 2 2021 to quarter 1 2023”.

Coherence and comparability

Coherence is the degree to which data that is derived from different sources or methods, but refers to the same topic, is similar. Comparability is the degree to which data can be compared over time and domain.

Data included in these and other TB reports published on GOV.UK has been collected in a consistent manner over time using web-based databases. NTBS replaced the 2 former systems in 2021 and older data (2018 onwards) was verified and migrated into NTBS. Where there have been changes in specific variables over time, either through addition or changes in definition, these are detailed in the report.

TB notification numbers and rates in England rose by 7% in 2021 compared with 2020, after falling from 2011 to 2020. The COVID-19 pandemic has had a complex impact on healthcare access and delivery, migration, and social behaviours, all of which may have influenced TB transmission, diagnoses and notifications. As a result, further analysis is needed to understand how the COVID-19 pandemic has affected TB epidemiology.

Trade-off between timeliness and completeness

There is a trade-off between timeliness and completeness for the statistics. There is some extra data processing for the national data set for the annual report, but limited for the live data of the quarterly report extract. Therefore, we label the data as provisional as not all cleaning steps have been completed. If we were to wait for all cleaning steps to be completed before publishing, there would be a longer time gap between the publication date and the reference period for the statistics. This would be at odds with our users’ needs for timely data.

Uses and users

Users of statistics and data should be at the centre of statistical production, and statistics should meet user needs.

This section explains how the statistics are used, and how we understand user needs.

Appropriate use of the statistics

The statistics present TB notifications. Notifications occur either when someone is diagnosed with TB, or they have started treatment for suspected TB. Some individuals with TB will not receive a diagnosis or start treatment, so their case will never be notified. Users therefore should not use these statistics as a measure of TB incidence.

There are seasonal trends in TB notifications, with a peak around late spring/early summer. The seasonality of TB presentation has been repeatedly reported in the UK and other non-UK countries, but the underlying mechanism behind this is not clear. Users should generally compare the same quarter year on year, rather than different quarters in the same year.

Known uses

We are aware that the statistics have been used in several different ways, including:

  • monitoring TB notifications and comparing different areas
  • strategy and resource allocation
  • awareness and teaching
  • research
  • clinical decision making
  • evidence on the TB Action Plan and the WHO’s end TB strategy

Known users

Known users of the statistics are primarily in clinical care and public health. We are also aware of users in the media, the charity sector, and academia and research.

User engagement

UKHSA has recently carried out a TB user survey. Users were asked to provide information about who they are and what they use the publication for. This has provided some new insights into our users, including how they use the publication, and what they would like to see in it. The survey includes some detail on the specific parts of the publication that users find most useful, as well as suggestions for improvements to the publication.

TB statistics by UKHSA

This release is part of a collection of TB statistics published by UKHSA.

The annual Tuberculosis in England report describes the incidence, treatment, and prevention of TB in England. This publication contains annual figures on TB notifications in England, as well as rates of TB notifications per 100,000 population.

Reports of cases of TB to UK enhanced tuberculosis surveillance systems presents TB notifications for England, Wales, Northern Ireland, and Scotland. The statistics show that notifications in Scotland and Wales follow a similar trend to those in England: they have fallen steadily from 2010 to 2020, with a slight increase in 2021. The rate of notifications in Northern Ireland has decreased slightly from 2010 to 2021.

The Tuberculosis (TB): regional and devolved administration reports comprehensively describe the epidemiology of TB for each region in England. These reports include information at the local authority level, as well as information on the demographic characteristics of those with TB.

TB in other countries

Most health protection functions in the UK are devolved to the other UK nations’ public health teams. Public Health Scotland publishes the Tuberculosis Annual Report for Scotland, the Public Health Agency reports on TB in Northern Ireland, and Public Health Wales reports on TB in Wales.

The European Centre for Disease Prevention and Control publishes the Tuberculosis surveillance and monitoring in Europe report. This provides an overview of the latest TB epidemiological situation in Europe.

WHO publishes the Global Tuberculosis Report. This provides a comprehensive and up-to-date assessment of the TB epidemic, and of progress in prevention, diagnosis and treatment of the disease, at global, regional and country levels.

Please note that it is not advisable to compare TB notification rates across countries, as TB incidence is strongly influenced by social and economic development and health-related risk factors.