Guidance

Implementation plan

Updated 16 July 2021

This implementation plan covers the steps that will be taken to implement the recommendations in the Joined up data in government: the future of data linking methods review.

Data linkage provides insight, informs policy change and helps answer society’s most important questions through increasing the utility of data. It is integral to government operations, decision making and statistics. We must ensure that the data linkage methods we are using are in keeping with best practice and evolving research trends across industries and identify skills or resource gaps preventing this.

The work detailed in this plan is a cross-government collaboration, with input from academia and third sector parties. There are four actions that will be taken forward to ensure implementation of the 10 recommendations in the review.

1. Set up a cross-government Data Linkage Champions Network

The review found that siloed working between departments can make it difficult to share research and good practice across government. To overcome this, the Government Data Quality Hub (DQHub), within the Office for National Statistics’ (ONS) Best Practice and Impact Division (BPI), will establish a Data Linkage Champions Network with representatives from across government. A similar setup will be used to existing BPI champions networks.

The DQHub will:

  • contact departments to promote the network and nominate champions
  • agree terms of reference and scope of the network
  • set up regular meetings and communications with champions

The DQHub will use the Data Linkage Champions network to:

  • monitor the level of academia and international collaboration on data linkage across government and how this could be improved, for example, are different departments all engaging with the same academics? Can we share our knowledge rather than separately engaging with our academic contacts?
  • establish how departments can contribute to the ten recommendations in this review, for example, what departmental projects can help deliver some of the recommendations?
  • ensure data linkage champions remain up to date on the latest methods by sharing methods and discussion from the ONS Data Linkage Expert Group
  • promote conclusions of Harron and Doidge’s quality assessment for data linkage paper
  • understand longitudinal linkage methods being used across government - if there is demand, a longitudinal data linkage working group could be established to embed these methods across government

Within ONS, a Data Linkage Hub has been created to provide a single planning and prioritisation function for data linking as well as the development of an internal data linking network and community to support business areas to create their own capability in linking. Responding to the growing need to link and match increasing amounts of data, the Hub aims to share knowledge not only across the ONS, but also more widely across government, academia and internationally and will play an important role in the Data Linking Champions Network.

2. Create data linkage courses, case studies and guidance

The review found that varied capability exists on data linkage across government. Increasing the availability of training and guidance will help ensure departments across government are equipped with skills to perform linkage effectively. The following actions will be taken to deliver improved training and guidance on data linkage across government:

  • the current data linking guidance pages on the GSS website will be reviewed and updated as well as being made available more widely across government - this will be achieved through collaboration with the Data Linkage Champions Network (action 1) to identify good practice case studies
  • the Methodology Division within ONS will produce e-learning data linking courses which cover types of data linkage, challenges in data linkage and methods for data linkage - resource has already been allocated to develop these courses
  • DQHub will work with the Data Linkage Champions and ONS Methodology to produce data linking quality guidance to build on Harron and Doidge’s quality assessment for data linkage paper - the guidance will include templates and metrics

3. Complete cutting edge research into data linkage methods

The review raised several areas where more research into linkage methods is needed to understand their utility and whether they can be applied to large-scale government data. This research will ensure that state-of-the-art methods are used across government for data linkage. Additional research will be done on the following topics.

3.1 Methods in Privacy-Preserving Record Linkage (PPRL)

Wherever possible data linkage should be undertaken using variables in-the-clear. However, where this is not possible, departments should use PPRL methods for linking hashed data. The Methodology Division in ONS will undertake further research on the Derive and Conquer method, for which the next step is to test the method in linking Pay as You Earn (PAYE) data from HMRC to the Personal Demographics Service (PDS) data. Further research will also be undertaken by ONS Methodology on PPRL using Bloom filters.

3.2 Integrated Data Programme for government

Integrated Data Programme (IDP) will be used to test and enable different ways of matching and linking. It will be a platform in which the PPRL methods mentioned above will be tested but it will also enable matching and linking through the indexing of data against the Reference Data Management Framework (RDMF) which structures and maintains reference data through address, business, classification and geography index. The RDMF will allow the users of IDP to retrieve unique match keys/ index ids associated with specific index, which will enable simpler linking across datasets.

3.3 The Statistical Methods Library

The statistical Methods Library (developed by ONS Methodology and deployed via IDP) will be used to facilitate development and sharing of linkage methods across government and drive adoption of best practice.

3.4 Longitudinal linkage methods

The DQHub will work with the ONS Data Science Campus for updates on their project involving linking the Inter-Departmental Business Register (IDBR) to trade in goods data. In addition, the DQHub will work with the Data Science Campus and the Economic Microdata Research team for updates on their Longitudinal Business Database (LBD) project. ONS Methodology will also undertake research on longitudinal linkage.

The Reference Data Management Framework (RDMF) team has been working with Methodology to capture longitudinal data across RDMF and has been engaging with the LBD team to ensure that the business index captures the longitudinal linkage requirements and best practices once it is operational. RDMF also works with Methodology on tracking and capturing changes in linkage across the RDMF indices over time.

3.5 Scaling method

Resource from ONS Methodology has been allocated to complete further research into the Scaling Method.

3.6 Machine learning methods and their potential for government linkage

ONS Methodology will lead on this and will utilise the Data Linkage Champions Network to find out what other machine learning methods for data linkage are being used across government.

3.7 Graph databases for management of linked data sets

ONS Methodology will lead on this new area of research and will utilise the Data Linkage Champions Network to understand if other departments across government are undertaking any research in this area.

A lot of the research described above needs test data (data that represents the population of interest for testing the effectiveness of processes or computer programs). Options for producing test data for government and academia to test linkage methods will be explored. The DQHub will work with Dr Katie Harron (UCL) to understand progress of the Wellcome funded linkage project as well as understanding the needs for test data across government by engaging with the Data Linkage Champions Network.

Once complete, the findings from these pieces of research will be shared across government via the Data Linkage Champions Network and advice will be provided on whether these methods are recommended for use across government.

4. Research scalable software solutions for linking large datasets

The review found that there is a lack of commonly used open-source software tools for data linkage. Splink, the Ministry of Justice’s in-house open-source software solution, has the potential to be utilised across government for linking large data sets. However, further testing of Splink is needed as well as considering other software solutions suitable for large-scale linkage both within and across government departments.

The DQHub will:

  • discuss with the Splink team at the Ministry of Justice any further testing plans for Splink
  • work with the Data Linkage Champions Network to test Splink across government
  • work with the Data Linkage Champions Network to understand other software packages used across government and their applicability to be shared more widely

The RDMF team has been testing the Splink tool with the view to use it as part of the RDMF index building pipelines. The team will continue to engage with the MoJ team, Methodology and Data Linkage Hub to enable part automation of linkage processes for the purpose of streamlining of RDMF.

As part of the RDMF workstream, the team will develop with Methodology and Data Science Campus a series of automated tools which will enable matching to the RDMF, such as Address Index Matching Service, Business Index Matching Service, Classification Index Matching Service or Geography Index Matching service.