Health and Social Care

14 Use Cases

Alan Turing Institute | Date Added: July 2021 | Proof of Concept | Differential Privacy |

A research paper by the Alan Turing Institute proposes using ‘health tokens’ to design COVID-19 immunity passports. These health-token based certificates would be created using differential privacy methods, allowing individual test results to be randomised while still allowing for aggregate level estimates of risk to be calculated. The research suggests that health tokens could mitigate discrimination based on immunity and tackle concerns around the creation of an ‘immuno-privileged’ class. However, it would still offer valuable information on collective transmission risk posed by small groups.

Turing website

Apple/Google contact tracing API | Date added: July 2021 | Product | Federated Analytics, De-identification techniques |

In response to the COVID-19 pandemic, Apple and Google collaborated to develop a privacy-preserving architecture facilitating digital contract tracing based on bluetooth proximity information. Access to the system is only made available to health authorities, who develop their own apps leveraging the Apple/Google APIs (for example the NHS app in England Wales). Mobiles phones with the app enabled exchange random identifiers (each phone’s identifier changes frequently) when in close proximity. Following a positive test, a user consents to upload details of their device’s identifiers from recent days to a central server managed by the health authority. All phones periodically download this list of identifiers corresponding to positive tests from the central server, and can alert those who may have been exposed to the virus to self-isolate if there is a match with the identifiers of close contacts stored on their device.

Google

Apple

Data Sharing Coalition | Date Added: June 2023 | Product | Multi-party Computation |

In the Netherlands, a public-private coalition of organisations came together to build better insights into provisions and needs in the elderly care system. In order to make use of sensitive health and care data, privacy tech company, Linksight, worked together with health insurer DSW and the local “Zorgkantoor” (municipal health and social care office), and created a data analysis platform using Multi-Party Computation. The Dutch elderly care system is facing rising care needs and monitoring this demand has been limited by barriers including privacy risks, cost to access data, and fragmentation of data, which is held by different parties. Using the platform (since only the aggregated data is viewed by the parties involved) the risks and costs for sharing and access are lower, and the resultant insights can be used by policy-makers to respond to the need better. The platform is currently used in the Delft, Schieland and Westland region of the Netherlands, but the Data Sharing Coalition highlights an ambition to scale this up to the national level.

Data Sharing Coalition use case poster

Data Sharing Coalition article

Duality Technologies | Date Added: July 2021 | Proof of concept | Homomorphic Encryption |

This is a framework for genome-wide association studies (GWAS) that leverages homomorphic encryption to keep medical and genomic data secure. The framework has been applied to conduct GWAS of age-related macular degeneration on a dataset of over 25,000 individuals. The system is ~30 times faster than state of the art GWAS schemes based on multi-party computation.

Original paper

PR Newswire

Frontier Development Lab / Intel | Date Added: June 2023 | Pilot | Federated Learning |

In a study on the relationship between radiation exposure in space and cancer, researchers used federated learning methods to access tightly protected human data on astronauts. This collaborative public-private project was by the Frontier Development Lab (FDL), which uses Intel’s Open Federated Learning (OpenFL) framework. Researchers were able to use data from institutions including NASA, Mayo Clinic and Nasa’s Gene Lab to train and combine models in situ, without requiring the data to be transferred. The main benefit was cost reduction, as the resources required to access and use such data would, otherwise, be a barrier to the research. As an organisation with a mission to provide public and scientific value, this cost-saving impact is an important outcome for NASA and associated organisations.

MarketScreener article

YouTube video by Intel Software

Google | Date Added: July 2021 | Product | Differential Privacy |

A publicly available resource of statistics and visualisations has been built with the intention to show the changes in the population’s mobility habits in response to COVID-19 interventions. The resource is based on location data from Google users who have opted in to location history tracking. Differential privacy is used to protect two metrics: the details of the location a user visited, and the number of visits the user made to each location.

Google blog

Technical paper

IQVIA | Date Added: July 2021 | Product | Multi-party Computation, De-identification techniques |

E360 Genomics uses a form of secure computation (tokenization of variants, multi-party is desired, and cell-size rules on statistical outputs). This is being leveraged by Genomics England.

Bio-IT World

IQVIA

Netherlands Organisation for Applied Scientific Research (TNO) | Date Added: July 2021 | Proof of concept | Homomorphic Encryption, Multi-party Computation, De-identification techniques |

As part of the BigMedilytics project, funded by the EU’s Horizon 2020 program, TNO developed a system to identify patients at risk of heart failure by confidentially combining data of potential indicators held by different organisations, leveraging a multi-party computation (MPC) protocol. The Erasmus MC hospital holds data on the lifestyle of patients, and insurance company Zilverin Kruis holds data on attributes such as hospitalization days and health care usage. The solution consists of two phases. First, a secure inner join protocol is used to identify which patients are present in both datasets. Both parties homomorphically encrypt attributes of the dataset which are sent to a third party, which determines the intersection of the datasets (the third party cannot access the raw data directly, since it’s homomorphically encrypted). The encrypted intersection is split into 3 secret shares, which are split across the 3 parties, and an MPC protocol is used to train a regression model. Erasmus MC and Zilverin Krius receive the coefficients of the regression, which they can then use to predict the risk of individual patients.

Youtube

Medium blog

NHS Digital/Privitar | Date Added: July 2021 | Product | Homomorphic Encryption, De-identification techniques |

The NHS has built a system for linking patient data held across different NHS domains. To protect patient confidentiality, identifiers (such as a patient’s NHS number) are pseudonymised through tokenisation. For additional security, the tokenisation differs between different NHS domains. Linking data about a patient held in two domains first requires removing the tokenisation which would expose personal information. To avoid this, a partially homomorphic encryption scheme is used which enables data to be linked without revealing the underlying raw identifiers.

Royal Society report on PETs

NVIDIA and King’s College London | Date Added: July 2021 | Pilot | Federated Analytics, Differential Privacy |

In 2019, researchers from NVIDIA and King’s College London collaborated to train a neural network for brain tumour segmentation using a federated learning approach. They used a dataset from the BraTS Challenge 2018, containing MRI scans from 285 patients, using 242 patients as training data, and 43 patients for testing. Training data was split into 13 shards, each representing a client in the federated setup. A data-centralised model was also trained for comparison. The data-centralised model converged in ~300 training epochs, with 205s per epoch. The federated model: converged in ~600 training epochs, with ~65s per epoch (slowest client). The model performance is comparable between the two setups, although the federated model incurs a tradeoff between privacy and performance, determined by the parameters of the differential privacy setup.

VentureBeat

OpenSAFELY | Date Added: July 2021 | Product | Federated Analytics, De-identification techniques |

OpenSAFELY is a secure analytics platform developed in response to the COVID-19 pandemic, which enables researchers to conduct analysis across millions of patients’ electronic health records (EHR). The platform works by leveraging federated analysis, where researchers’ analytic code is uploaded to the datacenter where EHR data is kept. The code is executed in the datacenter, with the data kept in situ - data is never moved from where it was originally kept. Researchers are thus unable to download data, mitigating a key risk. The platform provides researchers with dummy data (NOT synthetic data) to develop their code. Once developed, the code must pass a series of automated sanity checks before it is packaged and deployed to the EHR provider’s datacenter to execute the analysis. OpenSAFELY has enabled risk factors associated with COVID-19 to be identified, without exposing the personal information of individuals.

OpenSAFELY website

Nature paper

Owkin | Date Added: July 2021 | Product | Multi-party Computation |

French-American startup, Owkin, is using Federated Learning to build a score that predicts the severity of a patient’s COVID-19 prognosis. The AI-based scoring model is trained on CT scans of lungs (a routine procedure upon admission to hospital for COVID-19 patients). Its performance surpasses that of all other published score benchmarks. These scores support hospitals in resource management and planning at the frontline.

Imaging Technology News

Nature Communications (model publication)

Replica Analytics | Date Added: June 2023 | Product | Synthetic Data |

In the Canadian province of Alberta, a collaboration of organisations worked together to create synthetic health records data for use in research. Researchers trained a model using 100,000 healthcare records from patients in the province, which they, then, applied to generate synthetic data. Collaborators on this project include a non-profit funder (Health Cities), researchers from the University of Alberta, pharmaceutical company Merck Canada, synthetic data specialists Replica Analytics, and advisors from Alberta Innovates and the Office of the Information and Privacy Commissioner of Alberta. The synthetic data allows students and researchers to undertake projects which would normally be limited due to privacy concerns, related to the sharing of sensitive health data. Hence, this step supports the application of data science to (population) health sciences.

University of Alberta blog

Replica Analytics blog

Tsinghua University and Microsoft | Date Added: July 2021 | Pilot | Federated Analytics |

Medical Named Entity Recognition (NER) is an NLP task which aims to identify entities (e.g. drug names, symptoms) from unstructured medical texts (e.g. patient records, doctor’s notes). Microsoft collaborated with Tsinghua University to develop a federated system named FedNER to train a machine learning model to perform NER on a corpus of data held across a number of medical platforms. The model was decomposed into a private local model and a global shared model. Different medical platforms storing information in different formats are able to train the local model without having to wrangle their data into a defined format. This lowers the barrier to participation for any individual medical platform, maximising the amount of data used to train the system, thus enhancing its performance.

Research paper