Guidance

Data Ethics Framework: glossary and methodology

Updated 16 September 2020

Glossary

AI

AI can be defined as the use of digital technology to create systems capable of performing tasks commonly thought to require intelligence. AI is constantly evolving, but generally it:

  • involves machines using statistics to find patterns in large amounts of data
  • is the ability to perform repetitive tasks with data without the need for constant human guidance

(Source: GDS, OAI (2019) ‘A guide to using artificial intelligence in the public sector’)

Algorithm

A set of step-by-step instructions. Computer algorithms can be simple (if it’s 3pm, send a reminder) or complex (identify pedestrians).

(Source: Matthew Hutson (2017) ‘AI Glossary: Artificial Intelligence in so many words’)

Data

In general, data can be understood as discrete values and statistics collected together for reference or analysis.

When we refer to data, we mean both data about people generated through their interactions with services, and also data about systems and infrastructure such as businesses and public services. Data can be operational (collected in the process of running services or businesses), as well as analytical and statistical.

(Source: National Data Strategy 2020).

Personal data means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

(Source: ICO, GDPR).

Data ethics

Data ethics is an emerging branch of applied ethics that studies and evaluates moral problems and describes the value judgements related to data (including generation, recording, curation, processing, dissemination, sharing and use), algorithms (including artificial intelligence, artificial agents, machine learning and robots) and corresponding practices (including responsible innovation, programming, hacking and professional codes), in order to formulate and support morally good solutions (for example right conducts or right values). Data ethics encompasses a sound knowledge of data protection law and other relevant legislation, and the appropriate use of new technologies. It requires a holistic approach incorporating good practice in computing techniques, ethics and information assurance.

The aim of data ethics is to promote responsible and sustainable use of data for the benefit of people and society and ensure that knowledge obtained through data is not used against the legitimate interests of an individual or group while identifying and promoting standards, values and responsibilities that allow us to judge whether decisions or actions are appropriate, ‘right’, or ‘good’.

(Source: Luciano Floridi, Mariarosaria Taddeo (2016) ‘What is data ethics?’; Pernille Tranberg, Gry Hasselbalch, Birgitte Kofod Olsen & Catrine Søndergaard Byrne (2018) ‘Data Ethics. Principles and Guidelines for Companies, Authorities & Organisations’)

Data integrity

Data integrity is the overall accuracy, completeness, and consistency of data

(Source: Digital Guardian).

Data quality

The state of qualitative or quantitative pieces of information. There are many definitions of data quality but data is generally considered high quality if it is ‘fit for [its] intended uses in operations, decision making and planning’.

(Source: Geospatial Commission).

DPIA

Data Protection Impact Assessment.

(Source: ICO).

Data science

Data science describes analysis using automated methods to extract knowledge from data. It covers a range of techniques, from finding patterns in data using traditional analytics to making predictions with machine learning.

(Source: Data Ethics Framework 2018).

Explainability

Explainability is the extent to which the workings in a machine learning algorithm can be explained in human terms. It means expanding on the transparency of what variables are used to provide information on how the algorithm came to give an output, and how changing the inputs can change the output.

(Source: Government Data Science Community).

Machine learning

Machine learning is a subset of AI, and refers to the development of digital systems that improve their performance on a given task over time through experience. Machine learning is the most widely-used form of AI, and has contributed to innovations like self-driving cars, speech recognition and machine translation.

(Source: GDS, OAI (2019) ‘A guide to using artificial intelligence in the public sector’).

Non-practitioner

The start of the project to the end of the project.

(Source: Data Ethics Framework 2018).

Project cycle

The start of the project to the end of the project.

(Source: Data Ethics Framework 2018).

Public benefit

In general, public benefit can refer to the positive impacts that the project will have for the wider public.

(Source: Data Ethics Framework 2018).

Reproducible Analytical Pipelines (RAP)

Methodology for automating the bulk of steps involved in creating a statistical report. It is part of the GSS Quality Statistics in Government Guidance.

RAP is also a community of people who work with data using methods adapted from software development. The RAP community promotes the use of programming languages, version control, automated testing, peer review, and other tools and methods.

(Source: UK Government Data Science Github).

User

The person who is affected by, either indirectly or directly, by the project.

(Source: Data Ethics Framework 2018).

User Need

User needs are the needs that a user has of a service, and which that service must satisfy for the user to get the right outcome for them

(Source: GDS Service Manual 2017).

Methodology

The Data Ethics Framework was first published in 2016, with an iteration in 2018. The current version has been updated in 2020 following the process outlined below.

The team initially engaged with data practitioners around the government to understand their current use of the Data Ethics Framework and barriers to uptake through a survey with the following questions:

  • What are the top three things you think of when planning or assessing a data science project?
  • Have you used the Data Ethics Framework?
  • Is there any particular reason why you haven’t used the Data Ethics Framework?
  • Is there anything that would make you more likely to use the Data Ethics Framework?

Following the survey, the team set up a series of workshops with stakeholders from the wider public sector, academia, civil society, and the industry. In each of the day-long workshops, participants were asked to apply the Data Ethics Framework to a fictional policy scenario to identify areas for improvement in practice. The workshops provided space for identifying specific strengths and weaknesses of each principle of the framework and asked for feedback on the relevance, design, and potential options of mandating the framework.

The key findings from the workshops included the need to introduce the overarching principles that apply to every stage of the work with data; the need to reorder the principles to reflect the project process; the need to define some of the terms in the framework. Following each session, workshop participants were invited to submit any further feedback through anonymous forms that were then processed by the team.

Data collected through the workshops and surveys was anonymised and codified. The team worked with social researchers to categorise the findings into common themes (for example: ‘user focus’, ‘legal requirements’, ‘accountability’, ‘data quality limitations’) and evaluate what changes needed to be made in the corresponding sections of the framework and in its structure in general.

The updated content was drafted based on the data analysis and tested through five focus groups with data scientists and data policy officials from across the government. This phase of user-testing identified the need to amend the order of some of the specific actions and to provide ready actions for each of the overarching principles. Additional edits were introduced following feedback from professor Charles Raab, an expert on privacy, data protection, surveillance, security, and on regulatory policy and practice. When the updated content gained the ministerial approval, the team worked with the Design Council on the visual aspect of the framework.