Quality and methodology information for government grants data and statistics
Published 31 March 2022
1. General details
Title of output: Government grants data and statistics
Designation: Official Statistics
Coverage: UK exchequer funded grant spending during the financial year 2020 to 2021
Contact details: grants.data@cabinetoffice.gov.uk
2. Summary
Government grants data and statistics (GGDS) is an annual publication that describes the grant spending of the UK government. Data is sourced from departments using the Government Grants Information System (GGIS) — see ‘How the output is created’ section below.
This document provides an overview of the statistics and the process by which the annual Official Statistics bulletin and associated data tables are brought together. It includes sections on:
-
Output quality
-
About the Output
-
How the output is created
-
Validation and quality assurance
-
Concepts and definitions
3. Output quality
This section gives users information that describes the quality of the data and statistics and it details any points that should be noted when using GGDS. ONS has developed Guidelines for measuring statistical quality based upon the five European Statistical System (ESS) Quality dimensions. We address these quality dimensions here and other important quality characteristics, including:
- Relevance
- Timeliness and punctuality
- Coherence and comparability
- Accuracy
4. About the Output
4.1 Relevance
(The degree to which statistical outputs meet users’ needs)
The primary purpose of these statistics is to provide the public with transparency on government grant spending. By offering a comprehensive data set of grants across government, it meets this purpose. The accepted government standard for publishing grants data is managed by 360Giving. This publication meets this standard, enabling 360Giving to upload the data onto their GrantNav tool. The data being on GrantNav allows government grants to be compared and aggregated with grants awarded by any other organisation that publishes to the 360Giving standard.
We offer users the ability to provide feedback on the publication via a feedback form. We will use any feedback received to inform future improvements to this publication.
4.2 Timeliness and punctuality
(Timeliness refers to the lapse of time between publication and the period to which the data refer. Punctuality refers to the gap between planned and actual publication dates.)
The data is published 1 year after the end of the financial year covered by the publication. This is to allow enough time for the data to be collected, and to ensure it is complete and accurate.
Table 1 - The end dates of the sections of the collection process during the 2021/22 data collection
Process | Date |
---|---|
Scheme data collection | 31st August 2021 |
Awards data collection | 30th November 2021 |
Data quality assurance | 14th January 2022 |
Data sign-off | 31st January 2022 |
Publication | 31st March 2022 |
Since being designated as Official Statistics, this publication has been published on the publication date initially stated on the government’s statistics calendar. In line with the Statistics Code of Practice, the publication month is declared one year in advance and the publication date is declared one month in advance.
4.3 Coherence and comparability
(Coherence is the degree to which data that are derived from different sources or methods, but refer to the same topic, are similar. Comparability is the degree to which data can be compared over time and domain – for example, geographic level)
We make every effort to ensure coherence and comparability between the data provided by departments. Owing to the different ways that this data is held on departmental finance and grants management systems, we are aware of a number of areas of variability between departmental datasets. These are outlined below:
We ask that departments provide us with the value that was awarded to recipients where possible. Where this data is not available, the actual value that was transferred to the recipient may be provided instead. A number of departments extract this data from finance systems, which only contain the actual financial value transferred. Details of the individual department approaches can be found in the departmental statements at the end of the statistics bulletin.
In most cases total grant spending values align with other financial statistics from departments (most notably their Annual Report and Accounts). In some cases this may not be the case. The main reason for non-comparability between published accounts and this publication is that the grants statistics in the accounts have a different scope to those here e.g. where the accounts values include “grant in aid” whereas this publication does not. In most cases the total value of the awards in a scheme add up to match the scheme value. For some schemes this is not the case. This is explained by the scheme value being a budgeted amount which has not necessarily been completely awarded within the financial year covered by the publication.
For a number of national security, personal information, and commercial sensitivities, some data in this dataset is redacted.
Some of the statistics calculated in the bulletin do not include grants to individuals (these statistics have footnotes, explaining this, in the bulletin). We do not gather award level data on grants provided to individuals.
The new version of GGIS allows us to automatically validate our recipient data against external databases. These databases are: Companies House, Charity Commision, UK Register of Learning Providers, and the Public Sector Classification Guide. This has allowed us to create a grant awards dataset that aligns with these databases. It has also allowed us to deduplicate our recipient data so that it is more consistent within itself.
Data from the financial year 2018/19 onwards has been in a consistent format, and has been published with supporting statistical summaries. Up until 2017 only ‘scheme level’ data was published. In autumn 2017, grants data at ‘award level’ for two departments (Ministry of Justice and Department for Transport) was published on GOV.UK for the first time. In the time series in this publication, we have therefore only included statistics from 2018/19 onwards, to ensure comparability.
4.4 Accuracy
(The degree of closeness between an estimate and the true value)
We expect that the data we publish is close to the actual value. We ask departments to ensure to check any discrepancies between the data set and the independently published accounts data. If there were any inaccuracies, this process should highlight them. Data processing errors may result in inaccurate values, but the risk of mistakes is mitigated by the extensive quality assurance and sign-off processes described below.
5. How the output is created
5.1 Data collection
Data is collected by the Cabinet Office from all departments, using GGIS. The GGIS database holds all data on grant schemes, awards and recipients. GGIS is a bespoke system that allows departmental users to upload their grants data and requires that users provide data that contains the fields required for this publication.
For the 2020 to 2021 publication, this data was commissioned from departments in July 2021. Departments were given until the end of November 2021 to provide this data. Some extensions to this deadline were negotiated with departments on a case-by-case basis. Once data was received, the team in Cabinet Office, checked the data (as described in the validation process below) and queried any issues with departments. Once all identified issues had been resolved, departments were provided with a final copy of their data, in the format it was to be published in. Departments were asked to acquire sign-off from an appropriate authority within their organisation for this finalised data. Departments were also asked to:
Ensure any personal identifiable information or other sensitive information was redacted. Compare their data to any other published data from their department and explain any differences. Check and verify the enrichment and deduplication of recipient data described below. Provide a statement to be published alongside the data if required.
As explained above, GGIS allows the validation of grant recipient data with external databases. The enrichment process uses the data that departments have provided and attempts to find a match in any of the external datasets. The following fields are used to match the data to these external databases:
-
Recipient name
-
Recipient postcode
-
Recipient companies house number
-
Recipient charity number
Where exact matches are not found, approximate string matching is used to try and identify the organisation in the external databases.
Where matches are found, data on GGIS is enriched and/or overwritten with the information on the external database.
This process ensures that recipient data is more consistent. This data then goes through a deduplication process which ensures that, where possible, only one version of each organisation is included on GGIS. The deduplication process utilises subsets of the fields in the database to identify duplicates. The fields used include:
-
Recipient name
-
Recipient postcode
-
Recipient companies house number
-
Receipt charity number
-
UK Provider Reference Number
Any identified duplicates were then merged into a single recipient record within the GGIS database.
5.2 Statistics generation
The 3 files that comprise this publication are all generated using a single Python script. This Python script does the following steps:
-
Reads the scheme, awards and recipient data from GGIS.
-
Filters the data to the appropriate financial year and excludes any data that is marked as sensitive.
-
Ensures every field in the data is formatted correctly.
-
Generates the full grants data register file.
-
Aggregates the data to produce the tables in the statistical tables file.
-
Generates the statistical tables file, including all the titles, formatting and notes.
-
Produces a text file containing the full text of the bulletin in Govspeak format.
-
Produces the graphs that are contained in the bulletin as images.
The text file and graphs (from step 7 and 8) are then uploaded to the Whitehall Publisher website. Whitehall Publisher automatically converts these files into a HTML file, ready for publication.
6. Validation and quality assurance
6.1 Data validation
While the underlying data quality is ultimately the responsibility of the departments, the team in Cabinet Office performs a number of data validations to add additional assurance.
Data from each department is validated at 2 points: once when data is initially received on GGIS and again when the data is complete.
The following checks are performed by GGIS automatically when the data is initially received:
-
All data provided matches the government grants data standard for that field.
-
All award data is associated with a scheme that was active in the same financial year as the award.
-
Dates are in the correct order (e.g. the end date is not before the start date).
Once all the data from a department has been successfully submitted to GGIS, a number of checks are performed by the Cabinet Office team. Any issues that are found are queried with the department providing the data. The following issues are checked for:
-
Schemes that have no awards or where the total award value is different to the scheme value.
-
Total scheme values that are different from annual accounts figures, where these are published by the department.
-
Multiple schemes that had very similar names (as they were possible duplicates).
-
Uninformative award or scheme descriptions.
-
Any personal identifiable information in the data.
-
Schemes or awards that had very low value (less than £1,000, or £100 respectively).
6.2 Quality assurance process
Once all data from departments is completed, the statistical documents could be generated as described above. These final outputs then undergo further quality assurance. A list of final document quality assurance checks is below:
The statistical bulletin
-
All figures in the text can be replicated from raw GGIS data.
-
All graphs can be replicated from raw GGIS data.
-
Top 10 tables can be replicated from raw GGIS data.
-
All departmental statements are included.
The statistical tables
-
All tables can be correctly replicated from the raw data.
-
All tables match the figures where they are referenced in the statistical bulletin.
-
Any common figures between tables match each other.
The grants register data
-
Full awards dataset volume and value can be replicated from the raw GGIS data.
-
Full scheme dataset volume and value can be replicated from the raw GGIS data.
-
All data that has been enriched with external databases has been enriched with data from the correct organisation.
-
All date fields are dates.
-
All numeric fields are numbers.
-
All categorical fields have only the categories that are specified in the government grants data standard.
-
All award IDs are unique.
-
All data that should be redacted has been.
-
The data passes the 360 Giving data quality tool.
-
Any negative values are explained in the departmental statement.
-
All departmental data matches the data as it was at the point it was signed off by the department.
-
Data in individual department tabs matches the full awards tab (minus individual and formula awards).
-
No personally identifiable information can be found in the data.
7. Concepts and definitions
Definitions of the published fields can be found in the Government Grants Register file.