Research and analysis

AI-assisted vs human-only evidence review

An exercise to investigate the robustness and reliability of using generative AI to help produce evidence reviews, compared to those only using human input.

Documents

AI-Assisted vs human-only evidence review: results from a comparative study [Printable Version]

Request an accessible format.
If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email alt.formats@dsit.gov.uk. Please tell us what format you need. It will help us if you say what assistive technology you use.

The impact of technology diffusions on growth and productivity: findings from an AI-assisted rapid evidence review [Printable Version]

Request an accessible format.
If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email alt.formats@dsit.gov.uk. Please tell us what format you need. It will help us if you say what assistive technology you use.

The impact of technology diffusions on growth and productivity: findings from a human-only rapid evidence review [Printable Version]

Request an accessible format.
If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email alt.formats@dsit.gov.uk. Please tell us what format you need. It will help us if you say what assistive technology you use.

Details

The Department for Science, Innovation and Technology (DSIT) and Department for Digital, Culture, Media and Sport (DCMS) commissioned the Behavioural Insights Team (BIT) in 2024 to run a comparative exercise to investigate the robustness and reliability of using generative AI to help produce rapid evidence reviews, compared to those produced only using human input.

Two BIT researchers separately conducted reviews on the topic ‘How technology diffusion impacts UK growth and productivity’. Both received the same briefing and inclusion criteria, but one review was produced by ‘human-only’ means whilst the other was ‘AI-assisted’, using a mix of AI tools supplemented by manual checks and edits to the AI-written output.

The AI-assisted output was completed in 23% less time and sped up the process of analysing and synthesizing studies. However, the initial draft of the AI output was also judged to be less fluent and required more revisions than the ‘human’ version.

 This study is effectively a case-study, and therefore the results are not generalisable. The ability of both humans and AI models to review literature will vary substantially, including by topic. However, the authors conclude that AI has the potential to enhance the process of conducting rapid evidence reviews, although at the time of conducting the research AI still produces errors that require manual verification. They posit that AI is improving and these issues might be reduced in future.

The authors therefore recommend that more work be undertaken to understand how and when AI can be implemented in evidence reviews. In this case study, Large Language Model (LLM) tools were found, on this occasion, to have sped up the process of analysing selected literature – for this phase of the literature review, the AI-assisted process took 56% less time. They also proved effective in synthesising credible overall summaries. However, it is important that researchers take the time needed to learn to use these AI tools effectively: precise, detailed and explicit prompts were found to impact these tools’ efficacy. Further research is needed to clarify the benefits and limitations of the technology.

Updates to this page

Published 23 April 2025

Sign up for emails or print this page