Wilton Park: Data Cleaning Tool

A tool to enable compliance with The General Data Protection Regulation (GDPR) by identifying and automatically cleaning personal data from the Wilton Park customer database.

Tier 1 Information

1 - Name

Data Cleaning Tool

2 - Description

GDPR Legislation requires Wilton Park to ensure that personal data that is held by the organisation is up to date, accurate and only retained if needed to deliver services. To make management of personal data more efficient and to ensure that Wilton Park only hold the data needed, the Data Cleaning Tool is used to help Wilton Park identify records that may be out of date or no longer required so that Wilton Park responsibly manage personal information and remove what is not required from its database.

3 - Website URL

N/A

4 - Contact email

enquiries@wiltonpark.org.uk

Tier 2 - Owner and Responsibility

1.1 - Organisation or department

Wilton Park Executive Agency

1.2 - Team

Technology Services Department

1.3 - Senior responsible owner

Head of Technology Services

1.4 - External supplier involvement

Yes

1.4.1 - External supplier

Red River Software Ltd

1.4.2 - Companies House Number

06483558

1.4.3 - External supplier role

The supplier built the entirety of the tool and is involved in maintaining and supporting it.

1.4.4 - Procurement procedure type

Under existing contract with supplier. The contract was awarded via open competition.

1.4.5 - Data access terms

The supplier has access as a requirement to fulfil the contractual data processing role. This access is ongoing as part of their obligation to support the service.

Tier 2 - Description and Rationale

2.1 - Detailed description

The Data Cleaning Tool is a rules based system and is used to manage Wilton Parks Contact records. Contact records hold personal data about our customers, such as names and email addresses which may become out of date if we have not been in touch with the customer for some time. GDPR legislation requires that we ensure we only hold accurate, up to date personal data. To ensure we comply with this legislation, the tool identifies any contact record that has not been modified for 24 months. The tool will trigger an email to the contact to ask if they want to stay on our database. If the contact requests their data to be removed, the tool will automatically remove the personal data from the database. If no response is received from the contact, the tool will continue to trigger an email every 12 months over a 3 year period to ask if the contact wants to be on our database. If there is no response after this 3 year period, the tool will automatically remove personal data from the contact record.

2.2 - Scope

The purpose of the tool is to ensure GDPR legislation is complied with by making sure that personal data in customer records is not held in the Wilton Park database unless it is an actively used record. This is the only process where this tool is used.

2.3 - Benefit

This tool assists Wilton Park in actively managing individual’s personal data and helps remove the record when it is no longer required. This tool enables Wilton Park to provide assurance of legal compliance. This tool also makes the process of managing Wilton Park data more efficient by helping staff with their data management responsibilities.

2.4 - Previous process

Before this tools implementation there was no tool that automated or assisted with the managing of customer data. As part of a database development project to modernise the hosting of customer data, this tool was proposed to the FCDO’s Data Protection Officer (DPO) responsible for Wilton Park to help manage customer data. This tool demonstrated how it make data management more efficient as previously it was managed manually and Wilton Park could not provide timely assurance that out of date personal data was being removed in a rapid manner. The Wilton Park’s senior management team & DPO’s decision was required before developing the tool.

2.5 - Alternatives considered

This is not a standalone tool or off the shelf tool that can be purchased or installed into the system - it is comprised of rules that are configured into an existing system. Due to this, no further tools were considered.

Tier 2 - Decision making Process

3.1 - Process integration

This tool undertakes the automated process of maintaining a customer database and keeping it up to date. The tool undertakes decisions based on the status of a record. For example, has the customer record reached a threshold to send out an email, and has the customer replied, and is there an action once the reply is received. The tool decides to send an email to the customer when they have not actively engaged Wilton Park for over two years The email asks the contact if they wish their data is removed, if the customer does not reply to the email the tool decides to retain the data until the customer either engages with Wilton Park or their record reaches a threshold by which a further email needs to be sent. This process starts after two years of no contact and takes place each year until 5 years have passed whereby the data is automatically anonymised. Anonymisation ensures that all personal information is removed while retaining a record of event participation. No data could be used to re-identify an anonymised individual as that information would be stored separately in different systems with different controls. No human interaction is required with the process. Without this tool, Wilton Park staff would need to spend a large amount of time manually identifying customers to routinely email, asking if they wish for their data to be removed. Wilton Park staff would then need to sift through these email responses for positive data removal requests and finally spend a large amount of time manually removing their data. This system saves many hours and energy as well as making it simple for the data owner and customer to maintain their data.

3.2 - Provided information

A list of proposed records to be cleaned is provided by the system before it cleans the data - this list is presented as a dashboard on the system which the data compliance officer can access and review at any time. No human decision is made to remove the data - the tool makes the decision to take the cleaning action, but for monitoring purposes provides information about the cleaning action it is about to take.

3.3 - Frequency and scale of usage

The tool is in constant use and manages a database of approximately 50,000 records. Wilton Park expects to review and engage around 20,000 contacts across the year.

3.4 - Human decisions and review

The ultimate decision to delete customer data remains with the customer for the first two years of no interaction with Wilton Park. Wilton Park staff review the tool performance on a monthly basis to check that the tool is working as it should do. Wilton Park staff review records that have not been modified in the past 24 months to ensure customers have been sent a delete request emailed as expected. The system logs on the customer record if the customer has requested to stay on the customer database.

3.5 - Required training

There is minimal training provided to Wilton Park staff on the use of the tool as staff have limited operational involvement with the tool as it is managed by a supplier. Wilton Park staff undertake online training on GDPR compliance on an annual basis which is supplied via the FCDO’s staff training platform.

3.6 - Appeals and review

Wilton Park’s privacy policy is publicly available and found on the website here (https://www.wiltonpark.org.uk/privacy-notice/)) Any individual requesting removal of their data can do so by emailing the Wilton Park Data Protection Officer at dataprotectionofficer@wiltonpark.org.uk. The policy was designed and written in a way that makes clear the customer’s data rights. This privacy policy is shared with all new customers before they are added to Wilton Park’s database and are asked for their consent for their information to be stored in line with the policy. If a user receives a request asking if they wish for their data to be deleted they only need to either ignore the email or have engaged Wilton Park again within a two year period.

Tier 2 - Tool Specification

4.1.1 - System architecture

The system is a .NET framework app backed by an Azure SQL database. Contact records are an entity which reside on our database hosting personal information about the people we deal with to provide services to the FCDO and wider government. A rules based formula is used to identify records which match the rules. The rules are applied to determine the last interaction date of a contact record by referencing a number of related entities such as whether the user has been a participant on an event with the last 2 years, has the user logged into the customer portal within the last 2 years, has the contact record been updated within the last 2 years. These related entities include any personal identifiable information, the users access to the Xen platform and any audit history. Rules are applied to determine whether a record requires cleaning or whether data is retained which is based on the response of the customer.

4.1.2 - Phase

Production

4.1.3 - Maintenance

The tool has been incorporated into a pre-existing continual improvement process which internal users feed into based on their own experiences with the system and feedback they receive from customers. Additional to this process, the data cleaning tool is regularly checked by the internal system support team by using existing data search and filter features which allow us to regularly spot check that that tool is identifying and taking actions against the correct records. The software developers who maintain the whole system also include monitoring the operation of the tool as part of their existing support. For example they monitor that the operations are running such as emails being issued, records being cleaned.

4.1.4 - Models

This is a rules based model - there is no learning required as the rules it works to are based on data, such as ‘last date modified’ that does not require interpretation or learning. Data is not transformed or converted. Actions are triggered to send out emails based on a pre-written template.

Tier 2 - Model Specification

4.2.1 - Model name

Red River have labelled the functionality as ‘Contact anonymisation check’

4.2.2 - Model version

There is no specific version of the tool as it is not stand alone from the wider the system. The system version is 2024-09-05-104124.84c1195b-rc

4.2.3 - Model task

Identify records, sends out emails, track responses, clean appropriate data.

4.2.4 - Model input

This uses the last interaction date of the contact record as stated in more detail within the architecture section

4.2.5 - Model output

Identified Contacts which have passed the relevant threshold are sent an email. Responses are logged. Records are cleaned.

4.2.6 - Model architecture

The rules for the data cleaning process are as follows: A contact that has had no interaction with Xen for 2 years will be sent an email requesting if they would like to either have their data kept or removed from the system.
After a further year, and if the contact has not responded, a further email will be sent requesting the same. This process will continue for another 2 years to take the total number of years of no interaction with Xen to be a total of 5 years. After the fifth and final email, the contact will be anonymised automatically after 21 days unless the contact requests data to be kept.

If the contact requests that their data is to be removed, this will happen after 21 days of the contact making the request and can be undone should they choose to do so until the 21 day period has passed.

If the contact chooses to have their data kept, this will effectively reset that contact record back to the start of the 5 year cycle of contacting.

4.2.7 - Model performance

The rules have been tested on test data to ensure the correct records were identified and that the system ran the way we expected. Any errors were identified and fixed.

4.2.8 - Datasets

Test system data sets - A copy of our live data set was anonymised and added into the test system - this ensured the test data was as close to our live system as possible. All personal information was anonymised by converting it into fake data - e.g. fake names. The relevant entities and meta data was altered in order to test the rules. For example the ‘last modified’ data was changed on the record to test if the tool would identify records not modified in the last 24 months.

4.2.9 - Dataset purposes

Development and testing

Tier 2 - Data Specification

4.3.1 - Source data name

Internal generated test data

4.3.2 - Data modality

Text

4.3.3 - Data description

Our test environment is populated with equivalent fake data which is used to test the tool. The metadata attached to the records was changed to test whether the tool identifies correct records - E.g. the age of the record is changed to test the tool would identify it.

4.3.4 - Data quantities

The test environment contains over 27,000 contact records for testing.

4.3.5 - Sensitive attributes

Very limited sensitive data is temporarily held within the data set, such as dietary requirements which relate to medical or religious beliefs or medical requirements that need to be supported during the event. This data is manually removed within days after an event has taken place and is out of scope for the cleaning tool (i.e. it will already have been removed before the algorithm triggers the record to be cleaned).

4.3.6 - Data completeness and representativeness

Contact records have various levels of completeness - for example, there are records that only have a name and phone number, other records will have full contact details (email address, multiple phone numbers). This level of information is dictated by how far along a data subject is on their enrolment for an event.

4.3.7 - Source data URL

Data set is not publicly available

4.3.8 - Data collection

The purpose of collection has remained relatively consistent over the life of Wilton Park as our mission has remained consistent. Data has always been collected for providing customers with Wilton Park events and information. The data set is constantly evolving as we expand our contact network and remove old connections.

4.3.9 - Data cleaning

No additional cleaning required as the tool is designed to work with the data set in its current state.

4.3.10 - Data sharing agreements

The supplier is a data processor.

4.3.11 - Data access and storage

Any Wilton Park employee involved in running events or supporting the database have access to the dataset. The level of access to the dataset is dependent on their function and is controlled via security roles. The supplier who maintains the system also has access. Identity assurance to access the database uses 2FA. Access also requires user credentials.

Data is only retained beyond 5 years with the permission of the contact (the data subject). Special category / Sensitive data is removed within a month after an event has taken place. Security Roles control which users have access to sensitive / special category data. Access is based on job function.

Wilton Park’s Technology Dept and the third party supplier are responsible for the storage and maintenance of the database. Data is encrypted in transit and at rest.

Tier 2 - Risks, Mitigations and Impact Assessments

5.1 - Impact assessment

A DPIA has been prepared, reviewed and signed off in November 2024. The DPIA covers the whole system and this tool forms part of the impact management.

5.2 - Risks and mitigations

  1. That the tool fails to identify relevant records and/or actions are not triggered. The impact being data is not cleaned or is cleaned without the owner being consulted.
  2. That the tool fails to function with the impact of non-compliance with GDPR legislation. The mitigations
  3. Testing and running Maintenace of the tool
  4. Measuring and monitoring the expected records to be captured by the tool.
  5. Individual data subjects retain the right to request their data be removed at any time.
  6. Regular checks to test the different functionality of the tool is working correctly.

Updates to this page

Published 10 February 2025