Reported road casualties in Great Britain, provisional estimates: methodology note
Published 25 November 2021
Introduction
This note outlines the method used to produce the provisional estimates of road casualties to take account of the fact that at the time these estimates are produced, not all required data is typically available.
Provisional estimates are produced twice a year; mid-year estimates, for the year ending June (usually published in the following November) and then for the calendar year (usually published in the following June). The same broad approach outlined below is used in each case though the mid-year estimates are generally less complete.
Background
In-year estimates of road casualty statistics with imputation used to account for missing data have been published for many years, previously quarterly and more recently at mid-year and end of year only. The current imputation method dates back to 2020.
The imputation method used in the mid-year provisional 2021 estimates (published November 2021) release builds on the imputation method used in the annual 2020 provisional release (published June 2021). 2020 brought new challenges to data completeness, with different patterns that older imputation methods struggled to contend with arising due to the impact on casualty trends associated with the COVID-19 pandemic.
This new method provides greater flexibility in setting thresholds for marking complete data, and also allows for a record level imputation. This was particularly important as the road safety team decided to implement the severity adjustment into this publication which requires record level data. As a result, the mid-year statistics can be shown across a time series whilst accounting for the changes in severity reporting systems.
Method
The method broadly works as follows:
- stage 1: assess completeness of monthly data for each police force
- stage 2: calculate scaling factors for different record types from those forces with consistent data
- stage 3: apply scaling factors to past data for forces with missing months
Stage 1. Monthly data availability
The method looks at data provision by police force and month, flagging cases where there is incomplete data. This follows the operational data pipeline where police forces provide data to DfT covering their respective area, and most forces follow a monthly data supply schedule.
The flagging of missing forces and months is created by comparing 2019 complete statistics to the 2021 incomplete forces. For 2021 missing data, a threshold percentage of 2019 complete data was used to flag missing force and months. This threshold can be manually set and changed from year to year.
Stage 2. Scaling factors
A scaling factor is created to see how the composition of road casualties has changed in comparison to a baseline period. This baseline is selected as the last timeframe where road casualty numbers are considered to be broadly stable. For 2021 data, this was chosen to be 2017 to 2019, as the 2020 trends were affected by COVID-19 and therefore atypical. This is later applied to the forces flagged as having missing data. To compare the latest year’s data to the baseline, police forces need to have fully supplied the latest year’s data from January to June (not be flagged as described above), and not changed their severity reporting system over the 2017 to 2021 timeframe, as casualty severity is used as a key field.
Of the data that has been fully supplied and excluding forces who have changed their severity reporting system, the following key fields are used to break down the scaling factor:
- month
- casualty sex
- casualty severity
- age group
- casualty road user type
- first road class
- speed limit
Stage 3. Application of scaling factors
The imputation method is designed to only scale records where their police force and month have flagged as missing. Police forces who have supplied data for a given month are not scaled. If a police force has partially supplied data and it is above a given threshold, the supplied data is used and no imputation method is used to uplift this count.
Where data for a given police force and month has been flagged as missing, a flag is then appended to the 2017 to 2019 complete data (for the equivalent month and force), which is used as the base for the record level scaling, through the application of the scaling factors calculated as described above. Imputed and non-imputed records are then combined, and this output is then joined to the severity adjustment and multiplied by the severity probabilities (see the guide to severity adjustments referenced above). This is used to produce the estimates in the mid-year publication.
Strengths and weaknesses
The above method allows estimation of data for one or more months for police forces that failed to supply this in time to inform the provisional statistics. This allows high-level figures to be produced to a set timescale, rather than waiting for complete data.
Any provisional figures are subject to change, and figures are rounded to the nearest 10 to indicate their provisional nature. Typically, the amount of data estimated is relatively low (of the order of a few percent of the total) so that broad trends and patterns are unlikely to be materially affected.
However, the impact on individual police forces, or categories with small numbers, can be larger and where force level figures have been imputed these should be interpreted with caution.
At mid-year, records for those forces where the data supplied has been considered complete are provided – without severity adjustment – in the open data made available on data.gov.uk alongside the statistics.
Once all records are fully validated, these are published separately in the annual results in September, with the complete underlying dataset made available at this time. Further information on the estimation method can be obtained from the road safety statistics team and feedback on the approach used is welcome.
Instructions for printing and saving
Depending on which browser you use and the type of device you use (such as a mobile or laptop) these instructions may vary.
How to search
Select Ctrl and F on a Windows laptop or Command and F on a Mac
This will open a search box in the top right-hand corner of the page. Type the word you are looking for in the search bar and press enter.
Your browser will highlight the word, usually in yellow, wherever it appears on the page. Press enter to move to the next place it appears.
Tablets and mobile devices normally have the option to “find in text” and “print or save” in their sharing or quick options menu of their browser, but this will vary by device model.
Contact details
Road safety statistics
Email roadacc.stats@dft.gov.uk