Official Statistics

Quality and methodology information: Legionellosis in residents of England and Wales: 2017 to 2023

Published 21 November 2024

Applies to England and Wales

About this report

This report explains the quality and methodology information (QMI) relevant to the ‘Legionellosis in residents of England and Wales: 2017 to 2023’ official statistics release published by the UK Health Security Agency (UKHSA).

This QMI report helps users understand the strengths and limitations of these statistics, ensuring UKHSA is compliant with the quality standards stated in the Code of Practice for Statistics. The report explains:

  1. The strengths and limitations of the data used to produce the statistics
  2. The methods used to produce the statistics
  3. The quality of the statistical outputs

About the statistics

Background

Legionellosis is a spectrum of diseases caused by Legionella bacteria. Illness can range from mild through to Legionnaires’ disease, a form of atypical pneumonia that can be severe and potentially fatal. Legionella typically inhabits natural water systems such as streams, rivers, and lakes. However, Legionella bacteria are also able to survive in artificial water systems, for example cooling towers, evaporative condensers, spa pools and hot and cold-water systems. Such man-made water systems mimic the organism’s natural habitat thereby providing an ideal environment for growth.

In the UK the principal route of infection is probably through direct exposure to aerosols generated and dispersed from colonised man-made sources, however in many cases, the source of the infection is not identified. Inhalation of these aerosols by individuals, particularly those with risk factors, such as older age, male sex, smokers, and immunosuppression, can result in legionellosis. A colonised water system which is not appropriately managed has the potential to be a source of major outbreaks.

While anyone can get infected by the bacteria, underlying conditions can make certain groups more susceptible. Underlying conditions include being over 50 years old, smoking, individuals with immunosuppressive conditions, long-term respiratory diseases and liver or kidney diseases.

Statistics

Legionellosis is a notifiable disease, meaning that registered medical practitioners must report confirmed and probable cases to the UK Health Security Agency in accordance with the Public Health (Control of Disease) Act 1984 and the Health Protection (Notification) Regulations 2010.

The National Enhanced Legionnaires’ Surveillance Scheme (NELSS) for residents of England and Wales was established in 1980 to collect enhanced surveillance data on all cases of Legionnaires’ disease. The scheme is managed by UKHSA’s the Acute Respiratory Infections (ARI) team.

The primary objectives of NELSS are to:

  • understand the epidemiology of Legionnaires’ disease
  • monitor trends in incidence, clinical features, risk factors and mortality
  • detect clusters and outbreaks of Legionella infection
  • identify sources of infection to aid colleagues to apply control measures and prevent further cases
  • disseminate Legionella surveillance data and intelligence to stakeholders involved in the investigation and management of cases in the course of their duty to protect public health

The official statistics report of Legionellosis in England and Wales presents the findings from cases of Legionellosis reported to NELSS.

The data in the statistics is provisional and is subject to revision.

Geographical coverage: England and Wales

Publication frequency: Annual

Contact

Lead analyst: David Howett

Contact information: legionella@ukhsa.gov.uk

Suitable data sources

Statistics should be based on the most appropriate data to meet intended uses.

This section describes the data used to produce the statistics.

Data sources

The data presented in this report is extracted from the NELSS database, which holds records of all reported cases of Legionellosis among residents of England and Wales. Cases are reported through the submission of a national surveillance scheme reporting form and entered into the NELSS database by the national ARI team.

The NELSS reporting forms capture a wide range of data, including case details, demographic information, social risk factors, clinical and microbiological data, as well as detailed information about the activities of each case in the 10 days prior to symptom onset. This includes information on potential environmental exposures. Once submitted, the reported data is assessed and verified to ensure that the case definition is met. A confirmed case is defined as one that has a clinical or radiological diagnosis of pneumonia with laboratory evidence of one or more of:

  • isolation (culture) of Legionella species from a clinical lower respiratory tract specimen
  • detection of Legionella pneumophila antigen in a urine specimen
  • detection of Legionella species nucleic acid (such as via polymerase chain reaction) in a lower respiratory tract specimen (such as sputum, bronchoalveolar lavage (BAL)

The Legionellosis in residents of England and Wales report only includes confirmed cases; it does not include probable cases.   

Verified cases are then analysed against the national dataset for risk factors and potential links to previously reported cases. Additional information is provided by regional Health Protection Teams (HPTs), Field Services, and the Respiratory and Vaccine Preventable Bacteria Reference Unit (RVPBRU). The national ARI team also conducts further data cleaning and review processes to identify and address duplicate records, inconsistencies, and missing data.

This report covers cases with symptom onset between 1 January 2023 and 31 December 2023 among residents of England and Wales. Data from previous years (2017 to 2022) are also included for comparative purposes. Please note that some historical data may differ from earlier publications due to updates made following the receipt of new information.

All population estimates used in this report have been sourced from the Office for National Statistics (ONS) 2001 census.

Data quality

The data that we use to produce statistics must be fit for purpose. Poor quality data can cause errors and can hinder effective decision making.

We have assessed the quality of the source data against the data quality dimensions in the Government Data Quality Framework.

This assessment covers the quality of the data that was used to produce the statistics, not the quality of the final statistical outputs. The quality summary section below assesses the quality of the final statistical outputs.

Strengths and limitations of the data

The strengths of the data are that:

  • Reporting of cases is mandatory, ensuring that NELSS provides a comprehensive and timely record of Legionellosis cases in England
  • NELSS is a live system, meaning cases are available as soon as they are entered, enhancing the speed of public health responses
  • Regular data cleaning and review processes ensure data quality, with validation rules requiring essential fields to be completed
  • National surveillance scheme reporting forms and communication with health protection teams help ensure sufficient and accurate information is collected for each case

The limitations of the data are that:

  • Counts of people with Legionnaire’s disease by region are based on the cases’ residence, which can differ to where people are diagnosed and treated
  • The NELSS database does not include people diagnosed or managed with Legionnaire’s disease in Scotland, therefore some people who are normally resident in England, but diagnosed or managed in Scotland, will not appear in the data
  • Milder or asymptomatic cases may go untested or unnoticed, leading to potential underreporting as not all cases are diagnosed and reported
  • Legionella infections may be misdiagnosed as other respiratory conditions, particularly in settings where Legionella testing is not routinely conducted, leading to missed cases
  • Most cases are identified using Urinary Antigen Tests (UATs), which mainly detect L. pneumophila serogroup 1. This limits the detection of other Legionella serogroups or species, leading to potential underrepresentation of certain strains
  • Difficulty in obtaining lower respiratory samples for cultures reduces the number of cultures which can be performed. More samples sent to reference labs would improve the identification of novel strains and help trace environmental sources

NELSS is the most appropriate source of data for these statistics. Legionnaires’ Disease (LD) is a notifiable disease, which means that UKHSA holds a comprehensive record of LD cases.

Accuracy

Accuracy is about the degree to which the data reflects the real world. This can refer to correct names, addresses or represent factual and up-to-date data.

Registered medical practitioners must report confirmed and probable cases to UKHSA within 3 days of a suspected or confirmed diagnosis. Local teams then report these cases to the national ARI team using the case report form. Data is reported to NELSS as soon as reasonably practical.

The NELSS database contains live data, and during the data input and cleaning stages any errors, inconsistencies or missing data are rectified. All data will have had all checks and validation completed by the time of analysis for publication.

Completeness

Completeness describes the degree to which records are present.

For a data set to be complete, all records are included, and the most important data is present in those records. This means that the data set contains all the records that it should and all essential values in a record are populated.

Completeness is not the same as accuracy as a full data set may still have incorrect values.

The national surveillance scheme reporting form contains mandatory fields that must be completed. In addition to personal and demographic details, mandatory fields include date of symptom onset, pneumonia status, whether the patient died, clinical history, risk factors, exposure-related information, and microbiology results. This ensures that the necessary information is recorded for each case.

Uniqueness

Uniqueness describes the degree to which there is no duplication in records. This means that the data contains only one record for each entity it represents, and each value is stored once.

Some fields, such as National Insurance number, should be unique. Some data is less likely to be unique, for example geographical data such as town of birth.

To create a new case, NELSS users must first perform a search to check whether the case has already been recorded. Data in this report was further inspected for duplicates before publication.

Consistency

Consistency describes the degree to which values in a data set do not contradict other values representing the same entity. For example, a mother’s date of birth should be before her child’s.

Data is consistent if it doesn’t contradict data in another data set. For example, if the date of birth recorded for the same person in 2 different data sets is the same.

The NELSS team conducts routine checks of data fields for consistency to identify potential errors and return queries to the relevant case managers for resolution to ensure data consistency.

Thorough quality checks are routinely conducted on the dataset to detect and interrogate inconsistencies across fields or any data anomalies as the data from national surveillance scheme reporting forms is entered into NELSS. These are rectified by ARI team who manage the NELSS database.

Laboratory microbiology testing results are linked to case information using key ID fields and will only link when these fields match.

Geographical data is linked to case information using postcode information and will only link if the postcode is correctly entered.

Timeliness

Timeliness describes the degree to which the data is an accurate reflection of the period that it represents, and that the data and its values are up to date.

Some data, such as date of birth, may stay the same whereas some, such as income, may not.

Data is timely if the time lag between collection and availability is appropriate for the intended use.

NELSS is a live database that is managed by the ARI team. Data lag is unlikely to affect this report. There is minimal delay between a case report, and it being entered into the NELSS database. In 2023, the median delay between onset of symptoms to being entered into the database was 10 days. Over 75% of cases are entered into the database within 2 weeks of symptom onset.

Validity

Validity describes the degree to which the data is in the range and format expected. For example, date of birth does not exceed the present day and is within a reasonable range.

Valid data is stored in a data set in the appropriate format for that type of data. For example, a date of birth is stored in a date format rather than in plain text.

NELSS restricts and prespecifies format of user entry limiting the probability of entering invalid data. For example, date of birth must be entered in a date format. These rules ensure that the data is entered in the correct format. This is further mitigated by regular checks and cleaning steps required in the production of these statistics.

Sound methods

Statistical outputs should be made using the best available methods and recognised standards.

This section describes how the statistics were produced and quality assured.

Dataset production

Data used in this report comes from the NELSS database and the microbiology reference laboratory. This data was enriched with geographical data based on a patient’s resident postcode. Data cleaning and validation were performed.

Quality assurance

This report is produced using R. The cleaning, production of the figures and supplementary data tables have been automated. This reduces the risk of human error as users do not have to manually update figures or copy and paste between documents. Quality assurance is done on the produced report and the code itself.

The figures and tables are sense-checked and compared with figures from previous reports for irregularities. All the automated outputs are manually checked in this way. If concerns are raised regarding one figure, further checks are conducted to assess possible errors in the data.

Confidentiality and disclosure control

Personal and confidential data is collected, processed, and used in accordance with the UKHSA Privacy Notice. All UKHSA staff with access to personal or confidential information must complete mandatory information governance training, which must be refreshed every year. Information is stored on computer systems that are kept up-to-date and regularly tested to make sure they are secure and protected from viruses and hacking. UKHSA staff do not store data on their own laptops or computers. Instead, data is stored centrally on UKHSA servers.

No personally identifiable information (PII) is included in published data. No specific disclosure control methods were used, as aggregation of the published figures protects people’s personal data and tables presented cannot be cross tabulated to reveal sufficient information about individuals to pose a meaningful risk of secondary disclosure.

The benefits of reporting small numbers in aggregated data are compared with the risk of secondary disclosure on a case-by-case basis. For example, there are relatively few L. bozemanii cases, but the risk of identification is low and is not used in conjunction with other identifiable information (e.g., age category).

Geography

The statistics in this report are published at 2 geographical levels: Country (England), and UKHSA Region. UKHSA Region is based on an individual’s residential postcode. If the postcode is missing, the UKHSA Region of the Health Protection Team is used. All postcodes were complete.

Quality summary

The Code of Practice for Statistics states that quality means that statistics fit their intended uses, are based on appropriate data and methods, and are not materially misleading.

Quality requires skilled professional judgement about collecting, preparing, analysing, and publishing statistics and data in ways that meet the needs of people who want to use the statistics.

This section assesses the statistics against the European Statistical System dimensions of quality.

Relevance

Relevance is the degree to which the statistics meet user needs in both coverage and content.

There is a clear need for timely Legionnaires’ disease statistics. This data provides critical insights into the prevention and control of Legionella outbreaks in England and Wales. UKHSA monitors the incidence of Legionnaires’ disease through routine surveillance, contributing to national and international efforts to reduce cases by identifying risk factors and supporting outbreak investigations.

Legionnaires’ disease is a relatively rare but serious illness. Most cases are sporadic, with some occurring in clusters or as part of wider outbreaks. Given the potential public health risk, annual reporting provides a timely overview of trends and helps detect any emerging patterns. The data are essential for public health officials and healthcare professionals to assess progress in controlling the disease.

The Legionnaires’ disease statistics are used by a variety of stakeholders, including public health professionals, policymakers, and environmental health officers.

We have published a survey to get user feedback on the report for further improvements.

We have expanded our reporting formats to better serve users’ needs and now publish:

  • The main statistics report
  • Supplementary data tables, providing more granular insights
  • This QMI report

This variety of outputs ensures that the data are accessible and useful to a broad range of users, helping to inform prevention strategies and improve public health outcomes. By providing this range of different outputs, we can better cater to the needs of different users from a range of backgrounds, in line with the Office for National Statistics user personas.

Accuracy and reliability

Accuracy is the proximity between an estimate and the unknown true value. Reliability is the closeness of early estimates to subsequent estimated values.

The accuracy of the statistics is largely dependent on the accuracy of the source data. We have assessed the source data to be accurate (see the data quality section) as the design of NELSS helps prevent data entry errors, and guidance given to users helps ensure the right information is collected in the proper format. The statistics therefore represent the entire known population of all Legionella cases in England and Wales reported to UKHSA.

Where outputs are a result of a calculation, such as the average of a rolling period or an incidence rate per 100,000 population, a 95% confidence interval is presented.

The statistics present provisional data. The data is revised and updated as additional verification, data cleaning, and recoding are completed.

Timeliness  and punctuality

Timeliness refers to the time gap between publication and the reference period. Punctuality refers to the gap between planned and actual publication dates.

Annual and monthly surveillance reports on Legionnaires’ disease were previously published by Public Health England (PHE). However, this statistical series was put on hold to redirect resource towards the COVID-19 pandemic, with the last monthly report published in February 2020.

This reinvigorated report aims to provide timely and up-to-date figures on legionella epidemiological surveillance in England and Wales.

The annual reports are official statistics and are pre-announced at least 28 days in advance, in line with the Code of Practice for Statistics. Provisional publication dates for the year ahead are pre-announced online in December and can be found on the UKHSA release calendar.

Accessibility and clarity

Accessibility is the ease with which users can access the data, also reflecting the format in which the data are available and the availability of supporting information. Clarity refers to the quality and sufficiency of the metadata, illustrations and accompanying advice.

We currently publish 3 statistical products as part of this statistical release: The main statistics report, supporting data tables and this QMI report.

The main statistics report is published as an HTML web page making the report accessible across different devices and inherits the accessibility features mentioned in the GOV.UK accessibility statement.

The publication includes visualisations that help explain the data. These are designed to be colour-blind friendly. Each element in a visualisation has a different luminance value. This means that there is always sufficient contrast between elements for them to be distinguished.

We have simplified commentary in the publication, focusing on plain English. We also now include main messages in publication to help users understand the key findings from the statistics.

The supplementary data tables are published in ODS format and follow accessibility guidelines. Each sheet contains only one table. We also do not use nested tables as these do not always work well with screen readers. We avoid using empty cells for the same reason. Each sheet has a descriptive heading.

Coherence and comparability

Coherence is the degree to which data that are derived from different sources or methods, but refer to the same topic, are similar. Comparability is the degree to which data can be compared over time and domain.

Data included in these reports have been collected in a consistent manner over time primarily using national surveillance scheme reporting forms. We continue to modernise NELSS and the methods of data collection. Where there have been changes in specific variables over time, either through addition or changes in definition, these are detailed in the report.

It’s important to exercise caution when comparing these statistics directly with statistics from⁠ Public Health Scotland or the European Legionnaires’ Disease Surveillance Network (ELDSNet). Differences in data collection methods, processes, reporting criteria, and timelines can make direct comparisons unreliable.

Uses and users

Users of statistics and data should be at the centre of statistical production, and statistics should meet user needs.

This section explains how the statistics are used, and how we understand user needs.

Appropriate use of the statistics          

The statistics present Legionnaire’s disease cases. A case report is produced when someone is suspected or confirmed with a diagnosis of Legionellosis. Some individuals will not receive a diagnosis or start treatment, so their case will never be notified. Users therefore should not use these statistics as a measure of definitive Legionellosis incidence.

There are seasonal trends in Legionellosis cases, with a peak around summer. The seasonality of Legionnaire’s disease presentation has been repeatedly reported in the UK. Users should generally compare the same quarter year on year, rather than different quarters in the same year.

Known uses

The Legionnaires’ disease statistics are used by a variety of stakeholders, including public health professionals and policymakers. These users utilise the data to understand the epidemiology of Legionellosis and research. The data are essential for public health officials and healthcare professionals to assess progress in controlling the disease.

We are conducting a user feedback survey with the release of this statistic to gain a better understanding of user needs to make future improvements.

User engagement

NELSS is currently reviewing our outputs to align with the needs of our stakeholders. Owing to the substantial period between annual reports we have not been able to survey our users to adjust outputs to better meet their needs.

The acute respiratory infections team at UKHSA has published a user survey alongside the official statistics report. Users are asked to provide information about who they are and what they use the publication for. This will provide new insights into our users, including how they use the publication, and what they would like to see in it. The survey includes some detail on the specific parts of the publication that users find most useful, as well as suggestions for improvements.

This feedback survey will be undertaken regularly now that annual reports have resumed publication.

For feedback please contact legionella@ukhsa.gov.uk.

Most health protection functions in the UK are devolved to the other UK nations’ public health teams. Public Health Scotland publishes the annual Legionnaires’ disease Report Scotland. It’s important to exercise caution when comparing these statistics with those from other countries. Differences in data collection methods, reporting processes, criteria, and timelines can make direct comparisons unreliable.

The European Centre for Disease Prevention and Control publishes reports on Legionnaires’ disease surveillance and monitoring in Europe providing an overview of Legionnaires’ disease in Europe.

The World Health Organisation publishes guidance on prevention, diagnosis, and treatment of the disease, at global, regional and country levels.