NICE: Cochrane Randomised Controlled Trials (RCT) Classifier

Distinguishes randomised controlled trials from other types of study within research papers to help NICE staff narrow down the number of papers they need to read and review for evidence synthesis.

From:: Cabinet Office, Department for Science, Innovation and Technology and Government Digital Service
Published: 27 February 2025

Organisation:: Department for Health and Social Care
Organisation type:: Agency or public body
Function:: Health
Capability:: Discovery
Task:: Recommender systems
Phase:: Production
Region:: Wales and England
Date published:: 27 February 2025
ATRS version:: v3.0

Tier 1 Information

Name

Cochrane Randomised Controlled Trials (RCT) Classifier

Description

A classification tool that distinguishes randomised controlled trials from other types of study within research papers to help NICE staff narrow down the number of papers they need to read and review for evidence synthesis. The tool outputs a likelihood score that the study contains a reference to Randomised Controlled Trials for staff to help determine the need to read each research paper.

Website URL

https://www.sciencedirect.com/science/article/pii/S0895435620311720

Contact email

DITSL@nice.org.uk

Tier 2 - Owner and Responsibility

1.1 - Organisation or department

National Institute for Health and Care Excellence

1.2 - Team

Professional Team

1.3 - Senior responsible owner

Associate Director

1.4 - External supplier involvement

Yes

1.4.1 - External supplier

EPPI Centre, University College London (UCL)

1.4.2 - Companies House Number

N/A

1.4.3 - External supplier role

The UCL team worked with NICE to implement the RCT classifier into EPPI R5 (an application for systematic reviewing). The underpinning algorithms for the RCT classifier are managed and maintained by UCL staff.

1.4.4 - Procurement procedure type

The services of UCL were not procured, but provided to NICE without charge in the spirit of collaboration.

Tier 2 - Description and Rationale

2.1 - Detailed description

A machine learning classifier for retrieving randomised controlled trials (RCTs) was developed (the “Cochrane RCT Classifier”), with the algorithm trained using a data set of title–abstract records from Embase, manually labelled by the Cochrane Crowd. The classifier was then calibrated using a further data set of similar records manually labelled by the Clinical Hedges team, aiming for 99% recall. Finally, the recall of the calibrated classifier was evaluated using records of RCTs included in Cochrane Reviews that had abstracts of sufficient length to allow machine classification.

2.2 - Scope

This tool has defined limited scope to find RCT references within papers to help narrow down the number of papers read and reviewed. This is achieved by the tool inputting the titles and abstracts of research papers. It processes the information and then outputs a score indicating whether or not they are likely to be describing a randomised trial.

2.3 - Benefit

The key benefit of this tool is the time saving made by using an automated tool. Previously staff had to read research papers before knowing if it would likely contain references to randomised trials. This tool saves the time of staff to only fixate on the papers with RCT within them.

2.4 - Previous process

There was no previous technological process that supported staff to find papers with RCT within them and instead were entirely required to manually search papers.

2.5 - Alternatives considered

A range of different classification options was investigated. This ensemble of two algorithms Cochrane Randomised Controlled Trials Classifier & Priority Screening Detailed in this paper https://www.sciencedirect.com/science/article/pii/S0895435620311720 was found to be the most efficient in terms of accuracy and ease of deployment to a cloud infrastructure.

Tier 2 - Decision making Process

3.1 - Process integration

A key stage of conducting an evidence synthesis is to search for relevant studies on the topic area of interest. Common study types that NICE would be interested in for evidence syntheses of interventions would be randomised controlled trials (RCTs). The RCT classifier is available as a standalone option for classifying studies in EPPI R5, NICE’s evidence management software. This does not run automatically and it is a choice made by the reviewer to use the classifier. The RCT classifier is only used on search outputs specifically designed to identify randomised controlled trials. The classifier takes as input the titles and abstracts of research papers. It outputs a score indicating whether or not they are likely to be describing a randomised trial. This reduces the manual effort of identifying randomised trials from a search output.

3.2 - Provided information

This tool has limited scope. It takes as input the titles and abstracts of research papers. It outputs a score between 0 and 1. A score close to 1 (e.g., 0.97) suggests that the paper likely describes a randomized control trial, while a score close to 0 (0.10) indicates it is unlikely to describe one. This score is not presented to the user. When the user initiates the classification job, and selects a box that says Exclude non RCTs, then it will set those deemed as non RCTs as excludes automatically (i.e. they are not in the list of papers to be screened manually). If it is thought to be an RCT, it is tagged as such, and will be presented to the user as a likely RCT. The tool selects those papers whose score is less than 0.244 to be automatically excluded.

3.3 - Frequency and scale of usage

This particular tool is not used very often for evidence synthesis purposes. It can only be used for search outputs specifically designed to identify only randomised trials. These are conducted infrequently, as the NICE approach for evidence synthesis is to search for both randomised trials and systematic reviews and it would be inappropriate to use the classifier on this dataset.

3.4 - Human decisions and review

This tool has limited scope. It takes as input the titles and abstracts of research papers. It outputs a score indicating whether or not they are likely to be describing a randomised trial which a human can use to help determine the likelihood of the paper being relevant. Scores indicate the probability that they describe randomised trials but it doesn’t replace human in the loop consideration of the studies to determine if they are randomised trials. Studies that are deemed as non RCTs may be deprioritised but checked at a later date. Although this is not always the case. There are also further checkpoints in place through the guideline development process such as opportunities for committee members and stakeholders to mention studies that may have been missed and these studies would be considered for inclusion in the evidence synthesis at this point.

3.5 - Required training

No specific training is required as the tool is available as an option embedded into EPPI R5 and there are instructions in the user manual.

3.6 - Appeals and review

The decision made by the tool does not directly affect the public. It’s used to improve the efficiency of our processes. All approaches to screening are accompanied by a 10% Quality Assurance check which involves another reviewer blind screening 10% of the studies and then a comparison is made of the screening decisions. Any discrepancies are discussed to agree whether the study should be included or excluded with disagreements escalated to a third independent reviewer.

Tier 2 - Tool Specification

4.1.1 - System architecture

See this paper for technical details on the model: https://www.sciencedirect.com/science/article/pii/S0895435620311720 . The paper also contains links to relevant other documentation and GitHub repositories. The model is deployed and maintained by EPPI Centre, UCL (external supplier).

NICE accesses the model via an API call. A tab separated file containing a list of the following data for each citation that needs scoring is sent to the model: Review item ID, Citation Title, Citation Abstract

The model evaluates a probability score for each citation and returns all the information that was originally sent along with a new probability score column in Tab-separated values (TSV) format.

4.1.2 - Phase

Production

4.1.3 - Maintenance

The model is deployed and maintained by the EPPI Centre (supplier). At the supplier end, the model is not routinely rebuilt in order to provide stability of output. The validation dataset is part of the deployment of the tool, meaning that performance can be checked and reviewed any time. NICE undertakes assessments of the model deployed and how it is maintained externally.

4.1.4 - Models

Support vector machine; logistic regression

Tier 2 - Model Specification

4.2.1 - Model name

Cochrane RCT Classifier

4.2.2 - Model version

4.2.3 - Model task

Classify titles and abstracts of research papers as to whether or not they describe randomised trials.

4.2.4 - Model input

Titles and abstracts of research papers

4.2.5 - Model output

Scores indicating the probability that they describe randomised trials

4.2.6 - Model architecture

The model is described in detail here (with links to source code): https://www.sciencedirect.com/science/article/pii/S0895435620311720 . It is an ensemble of a support vector machine and logistic regression classifiers.

4.2.7 - Model performance

The model is calibrated to achieve at least a 98.5% recall against all the randomised trials contained in Cochrane reviews at the time of publication.

Prior to implementing the classifier at NICE, we tested the RCT classifier in both surveillance and guidance development. In these tests, we used surveillance and guidance development topics that had been completed so we knew what studies had been identified and included in the reviews. We then used the same search strategy and ran it through the RCT classifier to compare the studies identified as RCTs using the classifier against the list included in the original reviews. These tests were used to compare the accuarcy of the RCT classifier with standard approaches and was used to inform decision making to adopt the tool.

4.2.8 - Datasets

Cochrane Crowd RCT dataset; Clinical Hedges dataset; all RCTs included in Cochrane Reviews. Detailed here: https://www.sciencedirect.com/science/article/pii/S0895435620311720

4.2.9 - Dataset purposes

Detailed in the paper. Cochrane Crowd data to build the model; clinical hedges data for calibration; and Cochrane RCTs for validation.

Tier 2 - Data Specification

4.3.1 - Source data name

Cochrane Crowd, Clinical Hedges, Cochrane included studies

4.3.2 - Data modality

Text

4.3.3 - Data description

The relevant data used for developing the models are the titles and abstracts from the datasets referred to below. Please see: https://www.sciencedirect.com/science/article/pii/S0895435620311720

4.3.4 - Data quantities

Approximate figures are: Cochrane Crowd: 280,000 records; clinical hedges: 49,000 records; Cochrane included studies: 92,000 records. The Cochrane Crowd records were used for training, clinical hedges for calibration and Cochran included studies for validation.

4.3.5 - Sensitive attributes

None

4.3.6 - Data completeness and representative-ness

The data are complete and are, as far as we can tell, a good representation of other titles and abstracts found in the healthcare domain

4.3.8 - Data collection

Please see: https://www.sciencedirect.com/science/article/pii/S0895435620311720 The data set used to train the classifier comprises a corpus of 280,620 title–abstract records retrieved from Embase using a highly sensitive search for RCT . Records retrieved between January 2014 and July 2016 inclusive were used to develop the classifier. Records that were obvious RCT’s were also excluded. The training dataset hence consisted of records which were not obvious RCT’s/ non-RCT’s. Next, each record in the training data set was labeled by Cochrane Crowd members according to whether it reported an RCT (n = 20,454) or not (n = 260,166). Each record was labeled by multiple Crowd members, with the final Crowd decision being determined by an agreement algorithm; The Clinical Hedges data set was built during 2000 and 2001 for the purposes of testing and validating sensitive search filters for RCTs. It contained 49,028 title–abstract records manually identified and selected by information specialists using a combination of hand search and electronic search methods. This validation data set (“Cochrane Included Studies”) comprises title and abstract records of all study reports included in Cochrane Reviews in which eligible study designs are restricted to “RCTs only,” published up to April 2017. The data set comprises 94,305 records of 58,283 included studies across 4,296 Cochrane Reviews.

4.3.9 - Data cleaning

The team were required to remove commonly used words that appear on PubMed stop word list (such as than, that, the, use, used, using, with, within, without, would). The remaining words were converted into vector representations using bag-of-words approach which is a technique for representing text data as an unordered collection of words, or bag, based on their frequency.

NICE data is sent to servers hosted by the supplier (UCL) NICE has a data sharing arrangement with UCL. In brief this agreement attributes the ownership of data (inputed by NICE staff) to NICE. UCL does not store, copy, disclose or use NICE data except as necessary for delivering the EPPI-Reviewer services. The systems that hold NICE data complies with UCL security policy. In case of a corruption or loss of data, UCL will inform NICE immediately and propose the remedial action.

4.3.11 - Data access and storage

Only titles and abstracts of research papers are transferred to UCL. There is no personal data involved. The data is not stored and discarded as soon as the model has provided the results

Tier 2 - Risks, Mitigations and Impact Assessments

5.1 - Impact assessment

There are no formal impact assessments conducted.

5.2 - Risks and mitigations

There is a risk that a potentially relevant RCT could be missed. However, it should be noted that this could also be the case with manual screening. Mitigations to this include the fact that the evidence synthesis is discussed with committee members who can flag if they think a potentially relevant study has been missed. The evidence synthesis is also consulted on with stakeholders who can also flag potentially relevant evidence. In all cases, any evidence flagged through sources beyond the search and sift stage would be considered against the protocol for inclusion in the review.

Published 27 February 2025

Contents

Cookies on GOV.UK