DSIT - Redbox
An AI assistant to civil servants in the UK, enabling them to use the best large language models (LLMs) to help make their daily tasks eaiser.
Tier 1 Information
1 - Name
Redbox
2 - Description
Redbox is an AI assistant to civil servants in the United Kingdom and currently being used by the Cabinet Office, DSIT and No 10 with plans to roll out more widely.
Civil servants can chat with or without their documents (up to and including OFFICIAL SENSITIVE) using the world’s most advanced LLMs and choose the LLM they would like to use. This helps them in their day to day work, completing tasks faster and more efficiently, including summarisation and re-drafting.
Created as a safe and secure way for civil servants to use the best large language models (LLMs) to help make their daily tasks easier. Redbox uses information provided by users and returns any information that is requested, whether that is a summary, redraft or translation. The scope of the tool is wide and far reaching and gives the user autonomy to use it the best way they see fit.
3 - Website URL
https://ai.gov.uk/projects/redbox/
https://github.com/i-dot-ai/redbox
FAQs: https://redbox.ai.cabinetoffice.gov.uk/faqs/
4 - Contact email
redbox-copilot@cabinetoffice.gov.uk
Tier 2 - Owner and Responsibility
1.1 - Organisation or department
Department for Science Innovation and Technology
1.2 - Team
Incubator for AI
1.3 - Senior responsible owner
Head of AI Engineering
1.4 - External supplier involvement
No
1.4.1 - External supplier
N/A
1.4.2 - Companies House Number
N/A
1.4.3 - External supplier role
N/A
1.4.4 - Procurement procedure type
N/A
1.4.5 - Data access terms
N/A
Tier 2 - Description and Rationale
2.1 - Detailed description
Redbox assists Civil Servants by allowing users to select from a range of large language models (LLMs) and utilise them to complete tasks of their choice. In a similar way to ChatGPT or Microsoft CoPilot, users can add documents or information via a prompt or ‘chat’ and the chosen LLM then pulls through the most appropriate (likely) answer or response. Unlike these publicly available tools Redbox is trusted with information up to and including OFFICIAL SENSITIVE and users can also switch between models.
This has helped in tasks such as:
- Document summaristion
- Grouping of themes or ideas
- Redrafting content in a specific tone of voice or style
- Turning rough notes into coherent sentences
- Shortening speeches or other content
and many many others.
2.2 - Scope
Redbox is a general purpose Generative AI tool that gives Civil Servants access to LLMs to use as they see fit. It is currently being used by over 2000 people across the Cabinet Office, DSIT and No 10 with plans to roll out more widely.
Created as a safe and secure way for civil servants to use the best large language models (LLMs) to help make their daily tasks easier. Redbox uses information provided by users and returns any information that is requested, whether that is a summary, redraft or translation. The scope of the tool is wide and far reaching and gives the user autonomy to use it the best way they see fit.
Redbox is not a search engine as it does not search the internet for information. It is a tool that uses LLMs to help synethise, sumarise and redraft information provided to it by users.
As a tool it is designed to help and support humans and not to replace them. Redbox should free up Civil Servants from arduous, repetitive administration tasks to do the interesting, engaging and creative work that are more suited to human thinking. It should also support them in improving their communication and documentation quickly, reducing review points.
2.3 - Benefit
Improved efficiency by helping generate quick insights through summarisation. and information retrieval, reducing time and effort required.
Redbox is currently in Beta phase to determine whether it addresses Civil Servants critical need to effectively manage and leverage the vast amounts of information generated and consumed. With ever-increasing volumes and complexities of documents, policies, and reports, Redbox harnesses AI to summarise and synthesise information for users to do with as they wish.
Users have reported 2 primary benefits: Day-to-Day Tasks:
AI has significantly enhanced productivity in routine tasks. This includes drafting and summarisation of documents. It has helped in the initial preparation of reports, letters, or memos by redrafting the content itself or structure, allowing civil servants to focus on refining and customising. It has also helped to summarise complex and lengthy documentation into easily digestible themes and bullet points.
“Impossible” Tasks:
This tool has facilitated tasks that may seem unmanageable when performed manually, such as sifting through massive volumes of information across numerous documents. This capability is highly valuable in contexts where access to precise and relevant information is critical.
By efficiently searching and identifying relevant data, AI has helped users locate specific information or insights that might otherwise require considerable time and effort to find, thereby aiding in decision-making and policy development.
2.4 - Previous process
Previous to roll out of Redbox to Civil Servants there was no Large Language Model product that they could use up to the official sensitive security classification. In the Cabinet Office access to ChatGPT is banned. Redbox secured and sanctioned a way to use these models within government.
2.4 - Previous process
Departments don’t have access to CoPilot/ChatGPT products. So Redbox was created to provide a product bespoke to Civil Servants that allows them access to Microsoft/OpenAI models.
Redbox is accessible for civil servants to use multiple LLMs in one place. Users don’t need to worry about accounts/payment of licences or concerns about being allowed to use it. i.AI have one large account with the providers rather than an unknown number of smaller accounts, across departments.
The evaluation team are currently running a Test and Learn programme of work to evaluate Redbox and CoPilot products together.
Tier 2 - Decision making Process
3.1 - Process integration
There is no decision making process integration through Redbox. The user is responsible for the prompts or questions they ask of Redbox and how they use that information is down to the user. It is advised that the information provided by Redbox is validated and checked by the user before using it, particularly if making any legal documentation or decision making.
This leaves total autonomy to the user to create their own data set and to evaluate the information provided.
3.2 - Provided information
Once a user submits information to the Large Language model via the Redbox users interface the model then provides a response to the user in text format via the browser. This output is designed to address the user’s question or task by providing information, answering questions, or performing a specific function while maintaining a natural language flow. The response aims to be accurate, concise, and relevant to the user’s needs. The tool is unable to generate a provide answers in any other format e.g. documents, images or video.
3.3 - Frequency and scale of usage
As of February 2025 Redbox is used by 2,000 Civil Servants across the Cabinet Office, No10 and DSIT. The number of users grows by approximately 150 per week
3.4 - Human decisions and review
According to Redbox’s terms and conditions, users should not upload personally identifiable information. Each response includes a footer reminding users that LLMs may make mistakes and they should always verify for errors.
It is the user’s responsibility to determine whether the response adequately answers their question.
Users can ask additional clarification questions to refine the chatbot’s answers until they obtain the desired information.
If the tool does not retrieve the required information, users can initiate a new search by phrasing their question differently.
3.5 - Required training
There is a training programme currently being developed for users of Redbox. This will include videos and in person/virtual sessions delivered by a Cabinet Office Training team. There is an FAQ page that is available on the footer and a user community Google space for initial pilot users to share tips and tricks.
3.6 - Appeals and review
There is currently a rating system in the form of five stars, ‘1 star’ rates the response ‘not helpful’ and ‘5 star’ rating is measured as ‘very helpful’. This is provided to users to complete after the LLM generates each response. This information is then reviewed by the development each month and the responses scrutinised for their quality.
Tier 2 - Tool Specification
4.1.1 - System architecture
The Redbox architecture is securely deployed within an Amazon Web Services (AWS) environment. The system is designed to handle queries based on the information provided by the users through the LLMs.
Generative AI Integration: The retrieved documents are processed using the chosen LLM. This then summarises the retrieved information and generates a response, which is then presented to the user.
Session Management: The system maintains session history using AWS DynamoDB, allowing it to track and recall previous interactions, providing a more coherent and context-aware experience for users.
Security and Deployment: All components are securely managed within AWS, leveraging services such as AWS Lambda and DynamoDB with environment configurations handled through AWS environment variables and IAM roles for secure access.
Current Status and Future Plans: Currently, the system is in Beta and the code is available as opensource in GitHub. The Redbox repository can be found here : https://github.com/i-dot-ai/redbox
This architecture ensures that the Redbox can efficiently handle complex queries while maintaining a secure and scalable environment.
4.1.2 - Phase
Beta/Pilot
4.1.3 - Maintenance
The tool is being actively developed. We typically release a new version to production more than once a week. Users can report incidents either via a google chat group or via emailing the team. A plan for ongoing service management is being discussed and planned while we scale.
4.1.4 - Models
Multiple General Purpose Large Language Models: GPT, Claude, Gemini
Tier 2 - Model Specification
4.2.1 - Model name
Multiple general purpose Large Language Models
4.2.2 - Model version
GPT-4o, GPT-4o-mini, Claude-3 Sonnet, Claude-3 Haiku, Gemini
4.2.3 - Model task
The models are designed to be safe, accurate, and perform well in reasoning, coding, and other tasks. The models are Large Language Models that can receive process and provide responses in text-based semantic and easy-to-understand language. The models are generative pre-trained transformers. They have been pre-trained to predict the next word in large amounts of text.
4.2.4 - Model input
Users can input into the LLM prompt interface their questions, copy and pasted content or documents. All data is converted to plain text. Hyperlinks are treated as text and not followed. Images are ignored, but all text will be extracted if possible.
4.2.5 - Model output
The model responds to questions to documents submitted to it with text output responses.
4.2.6 - Model architecture
We intentionally provide the thinnest possible wrapper around the underlying LLMs. Experience has shown us that detailed prompts may work well for some use cases but they result in too many edge cases which lead to confusing responses for the user.
Generative pre-trained transformers https://www.anthropic.com/news/claude-3-5-sonnet
https://www.anthropic.com/claude/haiku
https://platform.openai.com/docs/models#gpt-4o
https://platform.openai.com/docs/models#gpt-4o-mini
4.2.7 - Model performance
The purpose of this tool is to provide access to models, not to evaluate, improve or rank them. Different models will suit different users and their different needs which are always changing.
The models chosen for Redbox are widely available from trusted suppliers such as Google, Anthropic and OpenAI.
Redbox passed testing by the i.AI team.
4.2.8 - Datasets
User provided datasets are numerous and varied. In the month of Feb 2025, we processed 7,000 documents. We have seen grant evaluations, invoices, zendesk extracts, and policies from across the Civil Service.
4.2.9 - Dataset purposes
Operational data
Tier 2 - Data Specification
4.3.1 - Source data name
Redbox User input data Redbox purposefully allows users to create their own dataset for Redbox to use in its responses.
4.3.2 - Data modality
Text
4.3.3 - Data description
This is dependent on what the user inputs into Redbox. Note that as all models are text based Redbox will only consider the textual content of any files that the user uploads.
4.3.4 - Data quantities
GPT 40 and 40 Mini: 128k tokens Claude 3: 200k tokens Gemini 2: 1m tokens
A token is roughly 3/4 of an English word.
4.3.5 - Sensitive attributes
As the dataset is created by each user it is not known. Information up to and including OFFSEN and no personal data.
4.3.6 - Data completeness and representativeness
The user takes responsibility for their own data.
4.3.7 - Source data URL
The information or data can be from information submitted by users. Users can submit up to OFFICIAL SENSITIVE security classification information.
4.3.8 - Data collection
As the dataset is created by each user it may have been repurposed from its original source so that it is easier for the LLM to create an accurate or helpful response.
4.3.9 - Data cleaning
As the dataset is created by each user it is not known. The data fed into Redbox is converted to text. As such, there may be data loss i.e: pictures or formatting such as headings, tables or footnotes
4.3.10 - Data sharing agreements
N/A
4.3.11 - Data access and storage
Data entered into Redbox is stored in AWS/S3 and AWS/RDS/Postgres hosted in the UK for 30 days. This data is not accessible, even to administrators
Tier 2 - Risks, Mitigations and Impact Assessments
5.1 - Impact assessment
Redbox has had penetration testing undertaken to test for vulnerabilities. The tool was evaluated by the i.AI Technical Design Authority and DSIT Architectural Governance Board. Both the Data Protection Impact Assessment and Privacy Policy have been signed off by the Department for Science, Innovation and Technologies Data Protection Officer.
5.2 - Risks and mitigations
There is a full risk register in place for Redbox. A couple are listed here:
Risk: Some users are exclusively using Redbox chat functionality and neglecting the ‘Documents’ features. There is a risk that users expect the chat to perform in the same way as asking a search engine. This is not the case and could result in less reliable answers than an internet search would provide. Additionally, using chat functionality for this type of query uses considerable amounts of energy which is less sustainable.
Mitigation: Adding this information into FAQs, ensuring this is included in training and investigating a sustainability programme of work for Redbox
Risk: Users do not have any or insufficient training before accessing the tool and therefore do things that are risky, incorrect or could have a detrimental effect to the department and its reputation
Mitigation: A training programme is being planned alongside a communication plan with departments.