Research and analysis

Evaluation of AI trials in the asylum decision making process

Published 29 April 2025

This research note summarises the evaluation of the Asylum Case Summarisation (ACS) tool and Asylum Policy Search (APS) tool pilots.

Executive summary

Asylum decision-makers spend a substantial amount of time analysing asylum interview transcripts and finding country policy information. As part of a wider asylum system programme of change, Home Office is trialling 2 tools to help speed up these processes. The Asylum Case Summarisation (ACS) tool uses artificial intelligence (AI) to summarise asylum interview transcripts. The Asylum Policy Search (APS) tool is an AI search assistant that finds and summarises country policy information. The tools were designed as an aid for decision-makers to improve efficiency but do not, and cannot, replace any part of the decision-making process. Small-scale pilots were conducted between May and December 2024 to explore the feasibility, accuracy and impact of the tools.

Key findings

  • the pilots suggest that the ACS tool could save 23 minutes per case when reviewing interview transcripts
  • decision-makers could save an average of 37 minutes per case when searching for country policy information using the APS tool
  • some ACS users reported limitations of the tool that would need refining in any full roll out
  • the tools did not appear to impact negatively or positively on decision quality[footnote 1] during piloting
  • these findings reflect small-scale pilots and further evaluation would be required for a full roll-out

1. Introduction

1.1 ACS

Decision-makers spend a substantial amount of time reading and analysing case documents and interview transcripts when making an asylum decision. Transcripts can extend to 50 pages or more and cognitive load for decision-makers is high.

The ACS tool aims to reduce cognitive load and speed up the (re)familiarisation of a case for decision-makers. The tool uses a Large Language Model to extract and summarise information from existing asylum interview transcript documents to provide decision-makers with a concise summary document. In line with the ‘human in the loop’ principle, ACS has been designed so that decision-makers cannot use the tool by itself to make a decision, instead it acts as an aid in the usual decision-making process.

1.2 APS

Finding country policy information is a time-consuming part of making asylum decisions, in particular for new decision-makers.

The APS tool is an AI search assistant which aims to speed up the process of looking for relevant country policy information. It is a chat-based interface which finds and summarises Country Policy Information Notes (CPINs) and Country of Origin Information Requests (COIRs) directly relevant to the case, to provide the policy basis for decisions. In line with the ‘human in the loop’ principle, APS has been designed so that decision-makers cannot use the tool by itself to make a decision and need to access the full CPIN or COIR.

2. Aim and scope of evaluations

Six key evaluation questions were identified:

  • has there been an impact on case-working productivity?
  • has there been an impact on the quality of decisions made?
  • are there any unintended consequences of using the tools?
  • do the tools provide the necessary information?
  • how do users feel about using the tools?
  • are the tools cost-saving?

3. Methodology

Both evaluations used a mixed-methods approach to collect primary data during and after the pilots.

The ACS pilot included a test group and comparison group in 2 Decision Making Units (DMUs). The DMUs were selected based on their availability to take part in piloting. An initial 8-week pilot phase was undertaken from May to June 2024, with 60 decision-makers in the test group and 15 decision-makers in the comparison group. A second 8-week phase of piloting was undertaken within the same DMUs from September to October 2024, with 45 decision-makers in the test group and 15 decision-makers in the comparison group.

The APS pilot used a test group and comparison group in 2 DMUs, different to those in the ACS pilot. The DMUs in the APS pilot were chosen based on their availability to take part in piloting. The pilot was undertaken between October and December 2024, with 50 decision-makers in the test group and 30 decision-makers in the comparison group.

3.1 Case logging

All participants in the test and comparison groups were asked to log information for each case undertaken. For the ACS pilot, the logging exercise captured data on 334 cases in the test group and 95 cases in the comparison group. For the APS pilot, the logging exercise captured data on 270 cases in the test group and 214 cases in the comparison group.

3.2 End of pilot survey

A short survey was conducted with the test group post-pilot. For ACS, the survey after pilot phase one received 26 responses (43% response rate) and the survey after pilot phase 2 received 6 responses (13% response rate). As survey responses were very low for phase 2, these responses were excluded from this analysis. For the APS tool, the post-pilot survey received 39 responses (78% response rate).

3.3 Semi-structured interviews

Semi-structured interviews were conducted with the test groups to provide a deeper understanding of users’ experiences following surveys. After phase 2 of the ACS pilot, 11 interviews were conducted, and 10 interviews were conducted after the APS pilot.

4. Impact findings

4.1 ACS

The impact on productivity

On average, the test group reviewed transcripts 23 minutes quicker than the comparison group (a 32% time saving). Interviews and survey results support the finding that the tool saved decision-makers’ time, as users expressed that the summary output acted as a refresher and orientated them to the basis of claim. Some interviewees, however, highlighted minimal time-saving due to the summary not providing source references.

The impact on decision quality

A dip-sample of decisions were reviewed using Calibre to check decision quality in both the pilot and comparison group. Of those reviewed, the proportion of sustained decisions was similar across the comparison and test groups, suggesting that the tool neither positively nor negatively impacted on the quality of decisions made. None of the reasons cited for non-sustained decisions were reported as being related to the tool.

Examining unintended consequences

It is important to consider unintended consequences that could occur alongside any benefits. This could include biased outcomes for different groups, and over-reliance on the tool in reaching decisions. There were no significant differences in outcomes between the test and comparison group for the most common nationality. However, pilot numbers were too small to fully test for variation in outcome for all characteristics, and this would require ongoing monitoring. The risk of users becoming over-reliant on the tool was explored through interviews. No interviewees felt that the summary output had influenced their decision-making, nor did it have the potential to do so, as it does not replace any stage in the process.

4.2 APS

The impact on productivity

Those in the test group saved on average 37 minutes per case, researching country policy information compared to the comparison group. This is broken down into time savings of approximately 12 minutes at pre-interview stage and approximately 25 minutes at decision-writing stage. Interviewees identified a range of time-saving benefits of the tool, such as helping them quickly identify information that was relevant to the specific case. The caseload distribution for nationality and reason for asylum claim was similar for the test and comparison groups, so it is unlikely that nationality or asylum reason was a significant contributing factor to the time savings.

The impact on decision quality

Similar to the ACS tool, a dip-sample of cases were reviewed using Calibre to check the quality of the test and comparison group decisions. A similar proportion of cases were sustained in both groups, which suggests that the tool did not impact the quality of decision-making.

Examining unintended consequences

Outcomes across nationalities and reasons for claim were similar in both the test and comparison groups, and do not suggest that the tool is impacting differently on these case characteristics. Similar to the ACS pilot, it is important to ensure that the tool is not being relied on by decision-makers. Initial interviews suggested that this was not happening in the APS pilot.

5. Process evaluation findings

5.1 ACS

The majority of survey respondents reported they would continue to use the tool beyond the pilot (65%) and felt the summary helped them to quickly understand the case (77%), although only 42% felt it gave them the right amount of information. This is echoed by some interviewees who noted that the summary output did not reference where to access the information in the transcript. This was a conscious decision of the tool designers for the pilot to reduce the technical complexity but could be revisited in future.

Technical specialists reviewed all summaries for accuracy prior to use in the pilot. A small proportion of summaries produced (9%) were deemed to be inaccurate or had missing information and were therefore removed from the pilot and these cases progressed in the business-as-usual way. Of the summaries that progressed in the pilot, 23% of users reported they were not fully confident in the summary information and would warrant further exploration in a full roll out.

5.2 APS

For the majority (82%) of queries asked to the tool, users reported that the tool provided sufficient information and references. Similarly, 79% of survey respondents said they were confident that the information provided by the tool was an accurate representation of the CPIN. A small minority (5%) of survey respondents were not confident in the tool’s accuracy, and some stated they did not see the benefit of using the tool over searching in the CPINs directly.

Just over half (54%) of survey respondents reported they would continue using the tool. The vast majority (75%), however, stated that it could be developed to add further benefit, such as increasing its functionality to integrate more sources. It should be noted that the limitations on the functionality of the tool were designed to ensure autonomy of the decision-maker and to avoid over-reliance.

6. Benefits

To assess the value for money (VfM) case for each tool, a larger sample is needed to recognise the potential time saved from the tool, the costs associated with full deployment and an understanding of the full business transformation required to deliver benefits across the asylum system.

7. Caveats and limitations

  • as these are small-scale pilots, the evaluation findings will only provide an indication of what a full rollout might look like
  • the DMUs in the pilots were not selected to be representative of all DMUs, therefore there may be location or unit-specific biases
  • all the evaluation methodologies use self-reporting, so may reflect inaccuracies in recall and biases associated with those that chose to respond

8. Equalities

Equality Impact Assessments for the tools have been completed which demonstrate compliance with the Public Sector Equality Duty. Future monitoring and evaluation should examine whether the tools have any equality impacts in the future, such as through assessing a wider variety of case types to examine the tools’ impact on different cohorts.

9. Conclusion and recommendations

9.1 Conclusion

Overall, the pilots demonstrate there are potential time savings associated with the use of the ACS and APS tools.

For ACS, although there were reports of some limitations, the overall experience was positive and the majority of users would like to continue using it in the future. The tool did not impact on decision quality, and users felt that the tool did not reduce autonomy or influence decision-making.

For APS, users highlighted a range of benefits such as orientating them on a case and a reduced need to seek extra input. The tool did not appear to impact on the decision quality, and users felt that the tool did not impact or influence decision-making.

For both ACS and APS, the findings suggest there is scope to increase functionality to improve user benefits; however, this needs to be balanced with ensuring the autonomy of decision-makers.

These findings reflect small-scale pilots and suggest there would be value in larger-scale evaluations of any future rollout to ensure there are no changes in the use and impact of the tools as they evolve.

9.2 Recommendations

  • any limitations of the tools identified in the evaluation should be addressed before a full rollout
  • any further rollout involves continuous monitoring and data capture in the early stages to ensure the accuracy, quality and use of the tools remains the same, and to ensure impact does not differ among case characteristics
  • full evaluation after deployment is recommended to capture the impact of the tools outside of the pilot environment and on a larger scale
  1. Decision quality was assessed using Calibre. Calibre is the quality assurance tool used in Asylum Operations to record the outcome of quality assessments.