Quality and methodology information for government grants data and statistics
Updated 12 April 2023
1. General details
Title of output: Government grants data and statistics
Designation: Official Statistics
Coverage: UK exchequer funded grant spending during the financial year 2021 to 2022
Contact details: grants.data@cabinetoffice.gov.uk
2. Summary
Government grants data and statistics (GGDS) is an annual publication that describes the grant spending of the UK government. Data is sourced from departments using the Government Grants Information System (GGIS) — see ‘How the output is created’ section below.
This document provides an overview of the statistics and the process by which the annual Official Statistics bulletin and associated data tables are brought together. It includes sections on:
- Output quality
- About the Output
- How the output is created
- Validation and quality assurance
- Concepts and definitions
3. Output quality
This section gives users information that describes the quality of the data and statistics and it details any points that should be noted when using GGDS. The Office for National Statistics has developed Guidelines for measuring statistical quality based upon the five European Statistical System (ESS) Quality dimensions. We address these quality dimensions here and other important quality characteristics, including:
- Relevance
- Timeliness and punctuality
- Coherence and comparability
- Accuracy
4. About the Output
4.1 Relevance
(The degree to which statistical outputs meet users’ needs)
The primary purpose of these statistics is to provide the public with transparency on government grant spending. By offering a comprehensive data set of grants across government, it meets this purpose. The accepted government standard for publishing grants data is managed by 360Giving. This publication meets this standard, enabling 360Giving to upload the data onto their GrantNav tool. The data being on GrantNav allows government grants to be compared and aggregated with grants awarded by any other organisation that publishes to the 360Giving standard.
We offer users the ability to provide feedback on the publication via a feedback form. We will use any feedback received to inform future improvements to this publication.
4.2 Timeliness and punctuality
(Timeliness refers to the lapse of time between publication and the period to which the data refers. Punctuality refers to the gap between planned and actual publication dates.)
The data is published 1 year after the end of the financial year covered by the publication. This is to allow enough time for the data to be collected, and to ensure it is complete and accurate.
Table 1 - The end dates of the sections of the data collection process
Process | Date |
---|---|
Scheme data collection | 31st August 2022 |
Awards data collection | 30th November 2022 |
Data quality assurance | 13th January 2023 |
Data sign-off | 31st January 2023 |
Publication | 30th March 2023 |
Since being designated as Official Statistics, this publication has been published on the publication date initially stated on the government’s statistics calendar. In line with the Code of Practice for Statistics, the publication month is declared one year in advance and the publication date is declared one month in advance.
4.3 Coherence and comparability
(Coherence is the degree to which data that are derived from different sources or methods, but refer to the same topic, are similar. Comparability is the degree to which data can be compared over time and domain – for example, geographic level)
We make every effort to ensure coherence and comparability between the data provided by departments. Owing to the different ways that this data is held on departmental finance and grants management systems, we are aware of a number of areas of variability between departmental datasets. These are outlined below:
Some departments provide the grant scheme and award values using the amount initially awarded in the grant agreements (i.e. budgeted values), other departments extract grant scheme and award values from finance systems, which only contain the actual financial values transferred (i.e. actual values). We provide departments with the ability to differentiate between budgeted or actual values for grant schemes and awards. In the 2021 to 2022 data, 74% of the value of schemes are actual values, and 93% of the value of awards are actual values.
In most cases total grant spending values align with other financial statistics from departments (most notably their Annual Report and Accounts). In some cases this may not be the case. The main reason for non-comparability between published accounts and this publication is that the grants statistics in the accounts have a different scope to those here. For example, the accounts values include “grant in aid” whereas this publication does not. In most cases the total value of the awards in a scheme add up to match the scheme value. For some schemes this is not the case. This is explained by the scheme value being a budgeted amount which has not necessarily been completely spent within the financial year covered by the publication.
Some data in this dataset is redacted for a number of reasons, including national security, personal information and commercial sensitivities.
Some of the statistics calculated in the bulletin do not include grants to individuals (these statistics have footnotes, explaining this, in the bulletin). We do not gather award level data on grants provided to individuals, instead these are recorded as a single award record stating the number of individuals that were awarded and the combined amount.
The GGIS allows us to automatically validate our recipient data against external databases. These databases are: Companies House, Charity Commission, UK Register of Learning Providers, and the Public Sector Classification Guide. This has allowed us to create a grant awards dataset that aligns with these databases. It has also allowed us to deduplicate our recipient data so that it is more consistent within itself.
Data from 2018 to 2019 onwards has been in a consistent format, and has been published with supporting statistical summaries. Up until 2017 only ‘scheme level’ data was published. In autumn 2017, grants data at ‘award level’ for two departments (Ministry of Justice and Department for Transport) was published on GOV.UK for the first time. In the time series in this publication, we have therefore only included statistics from 2018 to 2019 onwards, to ensure comparability.
4.4 Accuracy
(The degree of closeness between an estimate and the true value)
We expect that the data we publish is close to the actual value. We ask departments to ensure to check any discrepancies between the data set and the independently published accounts data. If there were any inaccuracies, this process should highlight them. Data processing errors may result in inaccurate values, but the risk of mistakes is mitigated as far as possible by the extensive quality assurance and sign-off processes described below.
5. How the output is created
5.1 Data collection
Data is collected by the Cabinet Office from all departments, using the GGIS. The GGIS database holds all data on grant schemes, awards and recipients. The GGIS is a bespoke system that allows departmental users to upload their grants data and requires that users provide data that contains the fields required for this publication.
For the 2021 to 2022 publication, this data was commissioned from departments in July 2022. Departments were given until the end of November 2022 to provide this data. Some extensions to this deadline were negotiated with departments on a case-by-case basis. Once data was received, the team in the Cabinet Office, checked the data (as described in the validation process below) and queried any issues with departments. Once all identified issues had been resolved, departments were provided with a final copy of their data, in the format it was to be published in. Departments were asked to acquire sign-off from an appropriate authority within their organisation for this finalised data. Departments were also asked to:
Ensure any personal identifiable information or other sensitive information was redacted. Compare their data to any other published data from their department and explain any differences. Check and verify the enrichment and deduplication of recipient data described below. Provide a statement to be published alongside the data if required.
As explained above, the GGIS allows the validation of grant recipient data with external databases. The enrichment process uses the data that departments have provided and attempts to find a match in any of the external datasets. The following fields are used to match the data to these external databases:
- Recipient name
- Recipient postcode
- Recipient companies house number
- Recipient charity number
- Recipient UK Provider Reference Number (UKPRN)
Where exact matches are not found, approximate string matching is used to try and identify the organisation in the external databases.
Where matches are found, data on the GGIS is enriched and/or overwritten with the information on the external database.
This process ensures that recipient data is more consistent. This data then goes through a deduplication process which ensures that, where possible, only one version of each organisation is included on the GGIS. The deduplication process utilises subsets of the fields in the database to identify duplicates. The fields used include:
- Recipient name
- Recipient postcode
- Recipient companies house number
- Recipient charity number
- Recipient UKPRN
Any identified duplicates were then merged into a single recipient record within the GGIS database.
5.2 Statistics generation
The 3 files that comprise this publication are all generated using a single Python script. This Python script does the following steps:
- Reads the scheme, awards and recipient data from the GGIS.
- Filters the data to the appropriate financial year and excludes any data that is marked as sensitive.
- Ensures every field in the data is formatted correctly.
- Generates the full grants data register file.
- Aggregates the data to produce the tables in the statistical tables file.
- Generates the statistical tables file, including all the titles, formatting and notes.
- Produces a text file containing the full text of the bulletin in Govspeak format.
- Produces the graphs that are contained in the bulletin as images.
The text file and graphs (from step 7 and 8) are then uploaded to the Whitehall Publisher website. Whitehall Publisher automatically converts these files into a HTML file, ready for publication.
6. Validation and quality assurance
6.1 Data validation
While the underlying data quality is ultimately the responsibility of the departments, the team in the Cabinet Office performs a number of data validations to add additional assurance.
Data from each department is validated at 2 points: once when data is initially received on the GGIS and again when the data is complete.
The following checks are performed by the GGIS automatically when the data is initially received:
- All data provided matches the government grants data standard for that field.
- All award data is associated with a scheme that was active in the same financial year as the award.
- Dates are in the correct order (e.g. the end date is not before the start date).
Once all the data from a department has been successfully submitted to the GGIS, a number of checks are performed by the Cabinet Office team. Any issues that are found are queried with the department providing the data. The following issues are checked for:
- Schemes that have no awards or where the total award value is different to the scheme value.
- Total scheme values that are different from annual accounts figures, where these are published by the department.
- Multiple schemes that had very similar names (as they were possible duplicates).
- Uninformative award or scheme descriptions.
- Any personal identifiable information in the data.
- Schemes or awards that had very low value (less than £1,000, or £100 respectively).
- Recipient identifiers are in the correct format based on the category of recipient.
- Awards that have an invalid Authority Act.
- Awards that have a mismatched recipient category and country, for example a recipient with category ‘UK company’, but with foreign address details.
- Awards that have been categorised incorrectly based on terms in the recipient name (e.g. ‘Ltd’).
- Recipient identifiers that do not appear on the relevant external database.
- Recipient names that differ to the name of the recipient on the external database, based on the identifier provided.
- Any data missing from mandatory fields.
6.2 Quality assurance process
Once all data from departments is completed, the statistical documents could be generated as described above. These final outputs then undergo further quality assurance. A list of final document quality assurance checks is below:
The statistical bulletin
- All figures in the text can be replicated from the raw GGIS data.
- All graphs can be replicated from the raw GGIS data.
- Top 10 tables can be replicated from the raw GGIS data.
- All departmental statements are included.
The statistical tables
- All tables can be correctly replicated from the raw GGIS data.
- All tables match the figures where they are referenced in the statistical bulletin.
- Any common figures between tables match each other
The grants register data
- Full awards dataset volume and value can be replicated from the raw GGIS data.
- Full scheme dataset volume and value can be replicated from the raw GGIS data.
- All data that has been enriched with external databases has been enriched with data from the correct organisation.
- All date fields are dates.
- All numeric fields are numbers.
- All categorical fields have only the categories that are specified in the government grants data standard.
- All award IDs are unique.
- All data that should be redacted has been.
- The data passes the 360 Giving data quality tool.
- Any negative values are explained in the departmental statement.
- All departmental data matches the data as it was at the point it was signed off by the department.
- Data in individual department tabs matches the full awards tab (minus individual and formula awards).
- No personally identifiable information can be found in the data.
7. Concepts and definitions
Definitions of the published fields can be found in the Government Grants Register file.