NICE: Cochrane Randomised Controlled Trials (RCT) Classifier
Distinguishes randomised controlled trials from other types of study within research papers to help NICE staff narrow down the number of papers they need to read and review for evidence synthesis.
Tier 1 Information
Name
Cochrane Randomised Controlled Trials (RCT) Classifier
Description
A classification tool that distinguishes randomised controlled trials from other types of study within research papers to help NICE staff narrow down the number of papers they need to read and review for evidence synthesis. The tool outputs a likelihood score that the study contains a reference to Randomised Controlled Trials for staff to help determine the need to read each research paper.
Website URL
https://www.sciencedirect.com/science/article/pii/S0895435620311720
Contact email
Tier 2 - Owner and Responsibility
1.1 - Organisation or department
National Institute for Health and Care Excellence
1.2 - Team
Professional Team
1.3 - Senior responsible owner
Associate Director
1.4 - External supplier involvement
Yes
1.4.1 - External supplier
EPPI Centre, University College London (UCL)
1.4.2 - Companies House Number
N/A
1.4.3 - External supplier role
The UCL team worked with NICE to implement the RCT classifier into EPPI R5 (an application for systematic reviewing). The underpinning algorithms for the RCT classifier are managed and maintained by UCL staff.
1.4.4 - Procurement procedure type
The services of UCL were not procured, but provided to NICE without charge in the spirit of collaboration.
Tier 2 - Description and Rationale
2.1 - Detailed description
A machine learning classifier for retrieving randomised controlled trials (RCTs) was developed (the “Cochrane RCT Classifier”), with the algorithm trained using a data set of title–abstract records from Embase, manually labelled by the Cochrane Crowd. The classifier was then calibrated using a further data set of similar records manually labelled by the Clinical Hedges team, aiming for 99% recall. Finally, the recall of the calibrated classifier was evaluated using records of RCTs included in Cochrane Reviews that had abstracts of sufficient length to allow machine classification.
2.2 - Scope
This tool has defined limited scope to find RCT references within papers to help narrow down the number of papers read and reviewed. This is achieved by the tool inputting the titles and abstracts of research papers. It processes the information and then outputs a score indicating whether or not they are likely to be describing a randomised trial.
2.3 - Benefit
The key benefit of this tool is the time saving made by using an automated tool. Previously staff had to read research papers before knowing if it would likely contain references to randomised trials. This tool saves the time of staff to only fixate on the papers with RCT within them.
2.4 - Previous process
There was no previous technological process that supported staff to find papers with RCT within them and instead were entirely required to manually search papers.
2.5 - Alternatives considered
A range of different classification options was investigated. This ensemble of two algorithms Cochrane Randomised Controlled Trials Classifier & Priority Screening Detailed in this paper https://www.sciencedirect.com/science/article/pii/S0895435620311720 was found to be the most efficient in terms of accuracy and ease of deployment to a cloud infrastructure.
Tier 2 - Decision making Process
3.1 - Process integration
A key stage of conducting an evidence synthesis is to search for relevant studies on the topic area of interest. Common study types that NICE would be interested in for evidence syntheses of interventions would be randomised controlled trials (RCTs). The RCT classifier is available as a standalone option for classifying studies in EPPI R5, NICE’s evidence management software. This does not run automatically and it is a choice made by the reviewer to use the classifier. The RCT classifier is only used on search outputs specifically designed to identify randomised controlled trials. The classifier takes as input the titles and abstracts of research papers. It outputs a score indicating whether or not they are likely to be describing a randomised trial. This reduces the manual effort of identifying randomised trials from a search output.
3.2 - Provided information
This tool has limited scope. It takes as input the titles and abstracts of research papers. It outputs a score between 0 and 1. A score close to 1 (e.g., 0.97) suggests that the paper likely describes a randomized control trial, while a score close to 0 (0.10) indicates it is unlikely to describe one. This score is not presented to the user. When the user initiates the classification job, and selects a box that says Exclude non RCTs, then it will set those deemed as non RCTs as excludes automatically (i.e. they are not in the list of papers to be screened manually). If it is thought to be an RCT, it is tagged as such, and will be presented to the user as a likely RCT. The tool selects those papers whose score is less than 0.244 to be automatically excluded.
3.3 - Frequency and scale of usage
This particular tool is not used very often for evidence synthesis purposes. It can only be used for search outputs specifically designed to identify only randomised trials. These are conducted infrequently, as the NICE approach for evidence synthesis is to search for both randomised trials and systematic reviews and it would be inappropriate to use the classifier on this dataset.
3.4 - Human decisions and review
This tool has limited scope. It takes as input the titles and abstracts of research papers. It outputs a score indicating whether or not they are likely to be describing a randomised trial which a human can use to help determine the likelihood of the paper being relevant. Scores indicate the probability that they describe randomised trials but it doesn’t replace human in the loop consideration of the studies to determine if they are randomised trials. Studies that are deemed as non RCTs may be deprioritised but checked at a later date. Although this is not always the case. There are also further checkpoints in place through the guideline development process such as opportunities for committee members and stakeholders to mention studies that may have been missed and these studies would be considered for inclusion in the evidence synthesis at this point.
3.5 - Required training
No specific training is required as the tool is available as an option embedded into EPPI R5 and there are instructions in the user manual.
3.6 - Appeals and review
The decision made by the tool does not directly affect the public. It’s used to improve the efficiency of our processes. All approaches to screening are accompanied by a 10% Quality Assurance check which involves another reviewer blind screening 10% of the studies and then a comparison is made of the screening decisions. Any discrepancies are discussed to agree whether the study should be included or excluded with disagreements escalated to a third independent reviewer.
Tier 2 - Tool Specification
4.1.1 - System architecture
See this paper for technical details on the model: https://www.sciencedirect.com/science/article/pii/S0895435620311720 . The paper also contains links to relevant other documentation and GitHub repositories. The model is deployed and maintained by EPPI Centre, UCL (external supplier).
NICE accesses the model via an API call. A tab separated file containing a list of the following data for each citation that needs scoring is sent to the model: Review item ID, Citation Title, Citation Abstract
The model evaluates a probability score for each citation and returns all the information that was originally sent along with a new probability score column in Tab-separated values (TSV) format.
4.1.2 - Phase
Production
4.1.3 - Maintenance
The model is deployed and maintained by the EPPI Centre (supplier). At the supplier end, the model is not routinely rebuilt in order to provide stability of output. The validation dataset is part of the deployment of the tool, meaning that performance can be checked and reviewed any time. NICE undertakes assessments of the model deployed and how it is maintained externally.
4.1.4 - Models
Support vector machine; logistic regression
Tier 2 - Model Specification
4.2.1 - Model name
Cochrane RCT Classifier
4.2.2 - Model version
2
4.2.3 - Model task
Classify titles and abstracts of research papers as to whether or not they describe randomised trials.
4.2.4 - Model input
Titles and abstracts of research papers
4.2.5 - Model output
Scores indicating the probability that they describe randomised trials
4.2.6 - Model architecture
The model is described in detail here (with links to source code): https://www.sciencedirect.com/science/article/pii/S0895435620311720 . It is an ensemble of a support vector machine and logistic regression classifiers.
4.2.7 - Model performance
The model is calibrated to achieve at least a 98.5% recall against all the randomised trials contained in Cochrane reviews at the time of publication.
Prior to implementing the classifier at NICE, we tested the RCT classifier in both surveillance and guidance development. In these tests, we used surveillance and guidance development topics that had been completed so we knew what studies had been identified and included in the reviews. We then used the same search strategy and ran it through the RCT classifier to compare the studies identified as RCTs using the classifier against the list included in the original reviews. These tests were used to compare the accuarcy of the RCT classifier with standard approaches and was used to inform decision making to adopt the tool.
4.2.8 - Datasets
Cochrane Crowd RCT dataset; Clinical Hedges dataset; all RCTs included in Cochrane Reviews. Detailed here: https://www.sciencedirect.com/science/article/pii/S0895435620311720
4.2.9 - Dataset purposes
Detailed in the paper. Cochrane Crowd data to build the model; clinical hedges data for calibration; and Cochrane RCTs for validation.
Tier 2 - Data Specification
4.3.1 - Source data name
Cochrane Crowd, Clinical Hedges, Cochrane included studies
4.3.2 - Data modality
Text
4.3.3 - Data description
The relevant data used for developing the models are the titles and abstracts from the datasets referred to below. Please see: https://www.sciencedirect.com/science/article/pii/S0895435620311720
4.3.4 - Data quantities
Approximate figures are: Cochrane Crowd: 280,000 records; clinical hedges: 49,000 records; Cochrane included studies: 92,000 records. The Cochrane Crowd records were used for training, clinical hedges for calibration and Cochran included studies for validation.
4.3.5 - Sensitive attributes
None
4.3.6 - Data completeness and representative-ness
The data are complete and are, as far as we can tell, a good representation of other titles and abstracts found in the healthcare domain
4.3.8 - Data collection
Please see: https://www.sciencedirect.com/science/article/pii/S0895435620311720 The data set used to train the classifier comprises a corpus of 280,620 title–abstract records retrieved from Embase using a highly sensitive search for RCT . Records retrieved between January 2014 and July 2016 inclusive were used to develop the classifier. Records that were obvious RCT’s were also excluded. The training dataset hence consisted of records which were not obvious RCT’s/ non-RCT’s. Next, each record in the training data set was labeled by Cochrane Crowd members according to whether it reported an RCT (n = 20,454) or not (n = 260,166). Each record was labeled by multiple Crowd members, with the final Crowd decision being determined by an agreement algorithm; The Clinical Hedges data set was built during 2000 and 2001 for the purposes of testing and validating sensitive search filters for RCTs. It contained 49,028 title–abstract records manually identified and selected by information specialists using a combination of hand search and electronic search methods. This validation data set (“Cochrane Included Studies”) comprises title and abstract records of all study reports included in Cochrane Reviews in which eligible study designs are restricted to “RCTs only,” published up to April 2017. The data set comprises 94,305 records of 58,283 included studies across 4,296 Cochrane Reviews.
4.3.9 - Data cleaning
The team were required to remove commonly used words that appear on PubMed stop word list (such as than, that, the, use, used, using, with, within, without, would). The remaining words were converted into vector representations using bag-of-words approach which is a technique for representing text data as an unordered collection of words, or bag, based on their frequency.
4.3.10 - Data sharing agreements
NICE data is sent to servers hosted by the supplier (UCL) NICE has a data sharing arrangement with UCL. In brief this agreement attributes the ownership of data (inputed by NICE staff) to NICE. UCL does not store, copy, disclose or use NICE data except as necessary for delivering the EPPI-Reviewer services. The systems that hold NICE data complies with UCL security policy. In case of a corruption or loss of data, UCL will inform NICE immediately and propose the remedial action.
4.3.11 - Data access and storage
Only titles and abstracts of research papers are transferred to UCL. There is no personal data involved. The data is not stored and discarded as soon as the model has provided the results
Tier 2 - Risks, Mitigations and Impact Assessments
5.1 - Impact assessment
There are no formal impact assessments conducted.
5.2 - Risks and mitigations
There is a risk that a potentially relevant RCT could be missed. However, it should be noted that this could also be the case with manual screening. Mitigations to this include the fact that the evidence synthesis is discussed with committee members who can flag if they think a potentially relevant study has been missed. The evidence synthesis is also consulted on with stakeholders who can also flag potentially relevant evidence. In all cases, any evidence flagged through sources beyond the search and sift stage would be considered against the protocol for inclusion in the review.