DSIT: Ask Ops Chatbot
Ask Ops Chatbot is a prototype tool that uses a Large Language Model to provide users with accurate and speedy answers 24/7 to questions about internal HR guidance.
Tier 1 Information
Name
Ask Ops Chatbot
Description
This is a Q&A chatbot prototype tool that uses a Large Language Model (LLM) by Anthropic (Claude Sonnet 3.5), to provide users with accurate and speedy answers 24/7 to questions about internal HR guidance.
Website URL
N/A
Contact email
analysis@digital.cabinet-office.gov.uk
Tier 2 - Owner and Responsibility
1.1 - Organisation or department
Central Digital and Data Office (CDDO)
1.2 - Team
Functional Data & Analytics
1.3 - Senior responsible owner
Head of Functional Data & Analytics
1.4 - External supplier involvement
No
Tier 2 - Description and Rationale
2.1 - Detailed description
As the tool is currently under development, we are testing the performance of the responses via API calls. Once in production, the goal is that the user will input a question into a user friendly interface and/or directly into Google Chat and receive a text based response from the LLM detailing summarised quotes from one or more documents including a reference to the document that it has been obtained from.
At this stage, we’re using a simple setup where questions are sent through an API to trigger the chatbot’s responses. This allows us to test and refine the chatbot’s performance. As we move forward, we plan to focus on creating a more user-friendly interface for easier interaction.
The staff ask questions related to HR policy in plain English, and the chatbot uses an intelligent agent (IA) to process these queries. This IA searches through all Cabinet Office and CDDO documentation stored in a specialised vector database. It employs a retriever method (from langchain) to find and return the most relevant documents based on the query. Additionally, the IA can use a math tool to handle any calculations needed during the interaction, as well as another tool to flag any likely response ratings by the user and send these to a DynamoDB database alongside the chat history. A large language model (LLM) then summarises the retrieved information into a user-friendly format, which is displayed back to the user in bullet points when multiple sources are referenced. Beneath this summary, the chatbot provides a list of the sources from which the information was retrieved. Users can continue refining their questions or ask new ones through the chatbot interface to get more specific answers.
2.2 - Scope
The scope of the chatbot includes a search retrieval functionality across the Cabinet Office and CDDO intranets only, which consist of webpages, word documents, powerpoint presentations and PDFs. The chatbot also has the ability to answer maths related questions (e.g annual leave entitlement, calculate travel expenses, etc.). Lastly, the chatbot has the ability to store user ratings in a DynamoDB database, which will help developer make improvements long term.
Questions that users would generally be expected to ask the chatbot are pre-existing guidance questions such as: ‘What is the leave entitlement?’, ‘What is your shared parental leave policy?’ , ‘What are the actions I should take when working from abroad?’, ‘What is food expenses entitlement when travelling with work?’, ‘How do I book train tickets?’, etc.
The chatbot has the capability to examine all documents and extract information from images and tables to assist with responses to the users, but this option has not been enabled as this increases the response time to requests and increases search costs.
The chatbot has been upgraded with a addon Python Interpreter Tool. This is a maths tool to provide users answers to complex questions that involve maths issues such as “ I have been working at the Cabinet Office for 7 years with a 6 month career break what is my leave entitlement?” This tool was added due to LLMs having weaker mathematical logical reasoning.
If a user tries to search for information not relating to internal guidance to policy such as “What were the football scores at the weekend?” then the tool would respond that this was not a valid request.
2.3 - Benefit
The chatbot provides users with responses quickly by searching across 1000 files in seconds, which would take a human days. The chatbot provides an improved user experience by allowing the user to avoid search and review multiple documents to find their answer. Aside from quantity and speed benefits, others benefits include accuracy, as the chatbot can accurately recall information of relevance and not miss content. The Current operations team believe that 90% of the frequently asked questions to their human-based team could be answered by the chatbot.
2.4 - Previous process
Staff members previously have had to search the intranet for the answer by looking across multiple policies and guidance documents across CDDO and Cabinet office intranets. If the user is unable to find the answer that they were looking for there, they have the opportunity to ask staff members in a group chat for an answer but this relies on the staff who manage the group chat knowing the answer themselves and the question being asked during work hours for a reply to be promptly received.
2.5 - Alternatives considered
N/A - No further relevant tools available to undertake this task sufficiently.
Tier 2 - Decision making Process
3.1 - Process integration
The tool provides the user with information that it believes the user has requested. It is down to the user to decide if the information that it has been provided by the tool has answered the question satisfactorily. The user is expected to review the answer provided by opening up the guidance that the answer has been referenced from to ensure this information is what the user was requesting. This tool is designed to act as a quote/document retrieval service, and it is down to the user to ensure that the information retrieved is what they wanted to receive. The user can ask follow up questions to narrow down the answer the chatbot returns or to undertake a new search using more refined language.
3.2 - Provided information
The tool provides the user with a text-based response that provides the user with quoted information directly from guidance and documents that the model has been trained on. The user also receives a list of reference locations as to where each of its responses had obtained the information from. it does not provide answers in any other format e.g. images. The tool may in the future provide direct hyperlinks that can be clicked on but currently just provides the location in text format.
3.3 - Frequency and scale of usage
265 staff members will have access to using this chatbot, with expected sporadic, but daily use.
3.4 - Human decisions and review
The chatbot has been designed and built to be as neutral and unbiased as possible so that it does not provide an opinion and only retrieves information from policies as an answer.
To ensure this tool can be as intuitive as possible and as helpful as possible to a human it has been built as a retrieval service that uses plain English inputs and outputs which is enabled by a similarity search to find information likely useful to the user even if the user has been unclear with their question via poor grammar or spelling.
It will be down to the end user to determine if the response provided satisfactory provides the information they were requesting.
The user can make further clarification questions to refine the answers that the chatbot provides, until they get the answer they were looking for.
The user can undertake another new search by asking their question in a different way, if the tool does not retrieve the information they required.
The user can also review the documents that the answers have been pulled from, if they wish to review the specific document to seek clarification, the chatbot response provides a direct reference to where the information has been pulled from. Once the user has sort clafficiation by looking through the document they can then decide if they are finally content with the answer or continue to ask clarifying or new questions of the chatbot.
3.5 - Required training
Developers: The tool was designed by an experienced team of Senior Data Scientists within the Central Digital and Data Office. To be able to build and maintain a GenAI tool developers would need skills such as: a strong foundation in machine learning, specifically in NLP and LLMs, to understand model integration and prompt engineering. Proficiency in cloud services, particularly AWS, is crucial, including expertise in AWS Lambda for serverless functions, Amazon S3 for data storage, DynamoDB for NoSQL database management, API Gateway for creating and managing APIs, Cognito for authentication, and IAM for secure access management. Experience with AWS CDK is essential for defining infrastructure as code. Robust software engineering skills are necessary, including backend development in languages like Python, API development, version control with Git, testing, debugging, and familiarity with DevOps practices and CI/CD pipelines.
Users: User guidance will be available once live, explaining to the user what the tool does and does not do. The guide will highlight potential risks of using a chatbot, such as hallucinations and not providing answers correctly. Due to the experimental nature of the technology, the training will remind staff to review their answers for accuracy, such as ensuring it has calculated their leave entitlement correctly.
3.6 - Appeals and review
Potential once live - There will be a ‘thumbs up’ and ‘thumbs down’ rating response mechanism that users can use to show satisfaction with the response of the tool. This data will feed a metric in a dashboard called ‘current user happiness’, with the responses received and will be implemented when live. Further potential capabilities once live will be for an urgent queries form to be available to staff to complete if they need to raise urgent issues with the tool which will be sent via email once completed to the development team. The users can currently leave feedback and their feedback will be saved in a database which developers can review.
Tier 2 - Tool Specification
4.1.1 - System architecture
The AskOps Chatbot leverages a Retrieval-Augmented Generation (RAG) architecture, securely deployed within an Amazon Web Services (AWS) environment. The system is designed to handle user queries related to HR policy by searching through a curated set of Cabinet Office and CDDO documents.
Key components of the architecture include:
Data Embedding and Storage: The documents are chunked, and their embeddings are generated using Amazon Titan Text Embeddings v2. These embeddings, which are numerical representations of the text, are stored in an OpenSearch vector database, allowing for efficient retrieval based on semantic similarity.
Intelligent Agent (IA): The core of the system is an agent that orchestrates the retrieval and response generation process. Upon receiving a query, the IA uses a retriever to search the vector database for the most relevant documents. It also incorporates a math tool for handling any necessary calculations, ensuring comprehensive and accurate responses.
Generative AI Integration: The retrieved documents are processed using Anthropic’s Claude, a state-of-the-art LLM integrated via AWS Bedrock. Claude summarises the retrieved information and generates a user-friendly response, which is then presented to the user.
Session Management: The system maintains session history using AWS DynamoDB, allowing it to track and recall previous interactions, providing a more coherent and context-aware experience for users.
Security and Deployment: All components are securely managed within AWS, leveraging services such as AWS Lambda, DynamoDB, and OpenSearch, with environment configurations handled through AWS environment variables and IAM roles for secure access.
Current Status and Future Plans: Currently, the system is in the prototype stage, without a user interface. The code is housed in an AWS CodeCommit repository, with plans to migrate to GitHub in the future.
This architecture ensures that the AskOps Chatbot can efficiently handle complex queries by combining powerful retrieval and generation capabilities, all while maintaining a secure and scalable environment.
4.1.2 - Phase
Pre-deployment
4.1.3 - Maintenance
Ask Ops Chatbot is still in its pilot phase, but will undergo periodic technical review once operational. This will enable new guidance and policies to be included and retired documents removed. During use, its users are able to provide feedback, which will drive future updates.
4.1.4 - Models
Anthropic Claude Sonnet 3.5 - Large Language Model Amazon Titan Text Embeddings v2
Tier 2 - Model Specification
4.2.1 - Model name
Anthropic Claude Sonnet
4.2.2 - Model version
3.5
4.2.3 - Model task
Claude 3.5 Sonnet is an AI assistant model from Anthropic that’s designed to be safe, accurate, and perform well in reasoning, coding, and other tasks. It is a Large Language Model (LLM) that can receive process and provide responses in text-based semantic and easy-to-understand language. Claude models are generative pre-trained transformers. They have been pre-trained to predict the next word in large amounts of text.
4.2.4 - Model input
Text-based HR questions in English
4.2.5 - Model output
Summary of relevant guidance and documentation available in its vector database, inclusive of quotes from pre-existing HR guidance and policies, with references to document locations where the quotes have been retrieved from.
4.2.6 - Model architecture
Generative pre-trained transformers https://www.anthropic.com/news/claude-3-5-sonnet
4.2.7 - Model performance
The foundational Anthropic models in AWS cannot be fined-tuned. They are ready to use, out-of-the-box models. The way to enhance these models is to provide them with context (e.g our CO intranet data) , so when the response to the user is formulated, the model retrieves the data from the vector database, and alongside its already innate language abilities, it can formulate an appropriate response.
To improve the performance and ensure the responses are as high quality and relevant as possible, we have implemented best practice when it comes to prompt engineering (these are instructions given to the model): - giving the model a role (e.g that of a operational specialist) - clear requirements on the format of the response - instructions on how to handle examples of policies where CDDO superseeds CO Intranet, or conflicting guidances, or documentation from different years - instructing the LLM to return that no relevant documents were found if the answer can’t be retrieved from the vector database.
4.2.8 - Datasets
All Cabinet office & CDDO intranet web pages and attachments (1101 total files) are used for training the model to provide accurate answers. Claude’s base model has received large amounts of training data, fine-tuning and configuration to create its base model.
4.2.9 - Dataset purposes
To train this model it required all the information from the intranet including webpages and files such as pdf and ppt and docx. Once the model had obtained this information it allows the model to provide relevant and specific CO and CDDO answers that the users are seeking.
Tier 2 - Data Specification
4.3.1 - Source data name
Cabinet Office Intranet Files (Webpages, Documents, Powerpoint Presentations and PDFs) CDDO Intranet Files (Webpages, Documents, Powerpoint Presentations and PDFs)
4.3.2 - Data modality
Text
4.3.3 - Data description
Webpage name, webpage content, Policy file name, Name of the policy, policy titles, policy text content
4.3.4 - Data quantities
All Cabinet office & CDDO intranet web pages and attachments (1101 total files)
4.3.5 - Sensitive attributes
Guard rails will be added to the LLM to limit the responses it provides to questions only about the HR guidance. This will include any personal data, the user won’t be able to search for who wrote or approved guidance via the tool if this is included in a policy document. The guard rails will also limit the LLM responses to not include general LLM queries outside of the HR guidance.
4.3.6 - Data completeness and representativeness
We have scraped all of the intranet webpages and associated attachments which means the LLM has access to all of the relevant documents.
4.3.7 - Source data URL
N/A - Data is on internal hosted intranet sites and not publicly available.
4.3.8 - Data collection
The purpose of the data collection is the same purpose that the intranet upload served which is to provide users with a way to find HR policy information.
4.3.9 - Data cleaning
N/A
4.3.10 - Data sharing agreements
N/A as the data is not be shared outside its original organisation
4.3.11 - Data access and storage
This data science team in CDDO have access to this data. The data is stored in an S3 bucket, in a restricted AWS account. Only a few specific CDDO members have access to the main AWS account, which requires multi-factor authentication.
Tier 2 - Risks, Mitigations and Impact Assessments
5.1 - Impact assessment
As this is an internal tool that doesn’t have any Personally identifiable information, an assessment such as DPIA isn’t relevant.
5.2 - Risks and mitigations
The main risks are: - hallucinations, - data becoming out of date, and - inaccurate outputs.
How are we mitigating against these risks: - to mitigate against the model hallucinating we are giving it access to our complete record of intranet data, so it’s less likely to make up things. For math related queries we have built a math tool (e.g how many days of leave I have if I have worked in CDDO for 3 years). This again makes it less likely for the model to come up with inaccurate answers. Furthermore, we are using prompt engineering to instruct the model on how to appropriately respond to a query and if it cannot find the answer in the vector database to return “No answer found”. The “temperature” setting on the LLM is also set to 0.1 to minimise the “creativity” and the scope of words the LLM has access to in its responses. - we currently have access to all of the data on the intranet, however new guidance might be published in the future. If the chatbot will go live, we will ensure that our scraping and updating of the vector database is undertaken on a consistent basis, that the model is updated in an automated manner. - it is possible for the LLM to return the wrong response to the user. To mitigate this, we have added the references to the response returned so that the user can double check the information from the LLM is relevant/correct. We are also planning to create some user documentation where we will share tips and tricks on how to use the chatbot.