DBT: Business Intelligence (BI) Subject Tagger
Suggests subject tags for Business Intelligence (BI), which is information that UK-based and international companies share with the department.
Tier 1 Information
1 - Name
Business intelligence (BI) subject tagger
2 - Description
The tool suggests subject tags for BI, which is information that UK-based and international companies share with the department. The tags are drawn from a pre-defined list e.g. ‘Regulation’ or ‘Supply Chains and Raw Materials’. The advantage of the tool is that it gives a quick overview of what subjects businesses are raising with DBT. Colleagues subsequently read each piece of feedback in full.
3 - Website URL
N/A
4 - Contact email
ai.governance@businessandtrade.gov.uk
Tier 2 - Owner and Responsibility
1.1 - Organisation or department
Department for Business and Trade
1.2 - Team
Digital, Data and Technology: Data Science Team
1.3 - Senior responsible owner
Chief Data Officer, Department for Business and Trade
1.4 - External supplier involvement
No
1.4.1 - External supplier
N/A
1.4.2 - Companies House Number
N/A
1.4.3 - External supplier role
N/A
1.4.4 - Procurement procedure type
N/A
1.4.5 - Data access terms
N/A
Tier 2 - Description and Rationale
2.1 - Detailed description
The underlying methodology is supervised machine learning. The approach is an ensemble of binary classification models, one for each of the subject tags. If a particular tag is (or is not) deemed to be applicable to a particular piece of BI, then the relevant model will apply a label of 1 (or 0). In the documentation that follows, we may refer to ‘the model’ for ease of explanation. In reality, we mean this ensemble of models.
The current list of potential subject tags is:
- Exports And Imports
- EU Trade
- Investment
- Regulation
- Trade Barriers
- Opportunities
- Supply Chains And Raw Materials
- EU Exit
- Tax
- Labour Market And Skills
- Free Trade Agreements
- Movement Of Staff Or Immigration
The output of the tool is BI instances, with zero, one or more of the above relevant subject tags applied.
2.2 - Scope
The tool is used to suggest subject tags for BI. It is applied to BI which is entered into the department’s customer relationship management (CRM) software, following colleagues’ correspondence exchanges or meetings with business contacts. Due to the extensive nature of the departments’ network, this BI can extend to many pages of new information each day. The tool gives a way to quickly ascertain the different subjects that are being raised, before the BI can be read in full. Each piece of BI is read in full by a colleague before being used onwards (e.g. in departmental briefings).
2.3 - Benefit
The tool allows the Department to get a quick and immediate overview of the issues being raised by businesses. The tags applied by the tool are available the day after the BI is entered into the CRM software, and can be viewed by anyone in the department.
2.4 - Previous process
Previously, colleagues had to read each and every piece of BI in full in order to get an idea of the subjects being raised. This could take hours depending on how much BI was entered. The tools allows that near-immediate overview.
2.5 - Alternatives considered
A rules-based approach was considered - for instance searching the text for certain words (e.g. VAT) and then assigning the appropriate subject tag (e.g. ‘Tax’). However, the diversity of language used by businesses who raise issues with DBT meant that a rules-based algorithm would struggle to account for all these different ways of discussing essentially the same subject.
A transformer-based “topic modelling” approach was investigated in depth. This being an “unsupervised” type method such that similar pieces of feedback are grouped together, there are two main issues: 1) the subjects/topics are not drawn from a pre-defined list therefore it is difficult to track the frequency of mentions for a subject throughout time; 2) there is no automatic way to understand the performance of the model in terms of how ‘accurate’ the assignment of the subject tags is.
There was an attempt to tackle issue #2 by manually measuring the performance for a small sample of feedback. The resulting estimate of recall was around 50% which was not deemed high enough.
Issue #1 was deemed insurmountable by the main users of the model, since they fundamentally want to use a pre-defined list of subject tags. Primarily so they can track the frequency with which subjects are mentioned over time.
Tier 2 - Decision making Process
3.1 - Process integration
The tool has a very low impact on decision-making at DBT. It is used as an initial triaging tool to quickly determine the subject of feedback from businesses. It can help colleagues identify if there is a ‘hot topic’ emerging that needs particular, rapid investigation by the department. All feedback is subsequently read through thoroughly, before being edited and collated (‘by hand’) and sent to relevant colleagues. The tool is therefore not used directly for making decisions.
3.2 - Provided information
The tool adds an additional column to a table of all BI input to date into our Customer Relationship Management platform. That additional column contains the subject labels suggested by the tool, that are applicable to each piece of feedback. The users can view that output on our department-wide data platform either in tabular format or a dashboard containing summary statistics. Such statistics include: of all BI received in the past month, what proportion of that was related to, for example, supply chain issues.
3.3 - Frequency and scale of usage
The tool is used on a daily basis by colleagues to understand the subjects that businesses are raising with the department. The BI is automatically labelled by the tool overnight, rather than being run manually, so the outputs are available to anyone in the department the following day. Some colleagues from other Government departments may also be granted controlled access to the outputs, if there is value for the public in those colleagues viewing the outputs. The public do not interact directly with the tool.
3.4 - Human decisions and review
Colleagues review all the tags that are applied by the tool and correct them as needed, to reflect how they would themselves have tagged the BI. This corrected data is used in future retraining of the underlying model. The tool outputs are not used directly in decision-making. Rather they help to inform the collation of the BI, by subject, and quickly flag if any issues could be ‘hot topics’ that many businesses are wanting to tell Government about.
3.5 - Required training
- Colleagues who provide the training data for the model must be briefed in how to determine which tags apply to each piece of historic BI. It is important for model performance that this tagging is consistent.
- Colleagues who use the outputs of the tool on a daily basis (i.e. the new BI, with subject tags applied) don’t need any training to access those outputs; they are in a simple tabular dataset available on our data platform. They do need to have an understanding of the accuracy of the tagging so that they can consider this when choosing how to use the tagged BI.
- The data science specialists who maintain the model must be shown how the training data is collated, how the model is retrained and how to check if the retrained model is of adequate quality. If they are further developing the model they must have a good grounding in the underlying techniques (convolution neural networks) and other techniques they choose to try, as applicable.
3.6 - Appeals and review
N/A - the tool is used to triage information. It is not used to make decisions that have a direct impact on members of the public.
Tier 2 - Tool Specification
4.1.1 - System architecture
Every night the most recently-trained model is applied to the latest BI received up to the close of the previous day, to generate the subject tags for the BI.
Both the model and the BI data are held in DBT’s private cloud storage area. The model is uploaded there manually after each retraining. The BI data is automatically stored following input into DBT’s Customer Relationship Management (CRM) system.
The scheduling of the tagging (i.e. obtaining ‘predictions’ from the model) is managed on DBT’s instance of the Apache Airflow platform.
A dataset of the tags is then automatically sent to our data platform. Users can then view the original BI, and the corresponding tags. Because the tagging is done overnight, the tags are available to users across DBT by the start of the working day after the BI was received by the CRM system.
4.1.2 - Phase
Production
4.1.3 - Maintenance
There is currently no planned development for the tool.
The model is scheduled to be retrained approximately every 2 months. This has been estimated as an appropriate balance between resource needed to do the retraining and the timescale on which the nature of the subjects covered in the BI substantially changes.
It is a recognised limitation of the tool that it will not recognise ‘new and emerging’ subjects with a specific subject tag.
4.1.4 - Models
The underlying model that generates the subject tags is a deep learning (convolutional) neural network model implemented in Python using the Keras library.
Tier 2 - Model Specification
4.2.1 - Model name
Business intelligence (BI) subject tagger
4.2.2 - Model version
20240531
4.2.3 - Model task
It is a supervised (classification) machine learning model which labels BI text with multiple tags to represent the subjects covered in the text.
This helps users to quickly group the BI by subject, and to understand how the frequency of mentions of each subject is changing in time.
4.2.4 - Model input
For training the model, the input is a sample of BI (free text) and corresponding subject labels applied by domain experts. When the model generates subject tags on new BI each night, it is simply passed the BI (free text)
4.2.5 - Model output
An array of the subject tags relevant to each piece of BI.
4.2.6 - Model architecture
The underlying model that generates the subject tags is a deep learning (convolutional) neural network model implemented in Python using the Keras library.
As explained in the Description section, the model is actually an ensemble of binary classification models - one for each subject tag. Specifically, it is a Sequential Model which is a stack of layers of neurons. General documentation can be found at: https://keras.io/api/models/sequential/
The layers used in our model are:
-
Embedding Layer, as described at https://keras.io/2.15/api/layers/core_layers/embedding/. input_dim (vocabulary size) is set to 10000 and output_dim (embedding dimension) to 100.
-
A convolutional 1D layer, as described at https://keras.io/api/layers/convolution_layers/convolution1d/. Parameter values of: filters=128; kernel_size=5; activation=”relu” (rectified linear unit activation function); all other values left as default.
-
A pooling layer for downsampling. See https://keras.io/2.16/api/layers/pooling_layers/global_max_pooling1d/. Default parameter values are used.
-
A densely-connected neural network layer (‘dense’ layer). As described at https://keras.io/api/layers/core_layers/dense/. Parameter values of: units = 8; activation=”relu”; all other values left as default.
-
A dropout layer to reduce the chance of over-fitting of the model. See https://keras.io/api/layers/regularization_layers/dropout/. Parameter value of rate set to 0.2, with all other values left as default.
-
A second dense layer with units=1 and activation=”sigmoid”.
4.2.7 - Model performance
As explained above ‘the model’ is a collection of binary classification models.
The main metric of interest to the users is the recall, as opposed to precision. This is because ‘we would rather the models apply tags incorrectly, but doesn’t miss applying any tags, rather than the opposite of models that don’t get any tags wrong but fails to apply tags in some cases’. The pre-agreed lower threshold for recall is 0.60.
The precision (as assessed when the models were trained) for the models which are currently used for tagging range from 0.34 for the ‘EU Exit’ tag to 0.84 for the ‘Opportunities’ tag. The recalls range from 0.60 for ‘EU Exit’ to 0.92 to ‘Tax’.
Therefore all models meet the threshold for recall. Although there is no agreed lower threshold for precision, two models (those tagging for ‘EU Exit’ and ‘Supply Chains and Raw Materials’) have precision lower than 0.50. This issue has been clearly explained to users (i.e. ‘of all instances given a particular tag, around half of those tags will be wrong’) so that they use these tags with appropriate caution.
4.2.8 - Datasets
The “Data Hub Interactions” dataset is the only one used to develop the model. This contains the BI data.
4.2.9 - Dataset purposes
The Interactions dataset is used for model training, validation and testing. See the Data Specification section for full details. All interactions are used that have been collected up to the date of when the training is done.
Tier 2 - Data Specification
4.3.1 - Source data name
“Data Hub Interactions”
4.3.2 - Data modality
Text
4.3.3 - Data description
Contains details of interactions that departmental colleagues have with UK and overseas businesses, in the course of supporting those businesses to grow / invest in the UK. These interactions may have been face-to-face, via email or on the phone. The details are input into our CRM platform called ‘Data Hub’. The data is only available to colleagues in the department, and some in other government departments if there is a value to the public in them having access. You can read more about the interactions and Data Hub here: https://digitaltrade.blog.gov.uk/2024/07/30/surpassing-4-million-interactions-how-we-made-collecting-data-a-doddle/
4.3.4 - Data quantities
The training data contains all those Interactions, which contain BI, recorded since early 2019. This now amounts to over 20,000 separate interactions. It has been supplemented with the subject tags applied manually by domain experts.
90% of entries in this dataset is used to train the model with 10% being used to test the performance of the trained model. A validation set is created from 10% of the training dataset. That is, the original dataset is split in the ratio of 81:9:10 training:validation:testing.
4.3.5 - Sensitive attributes
A substantial number of the interactions contain names, job titles and contact details of DBT employees or contacts at the external businesses. There are no known instances of sensitive personal data nor information on protected characteristics, beyond what a reader may infer from the personal data fields listed above.
4.3.6 - Data completeness and representativeness
All colleagues are regularly reminded of the importance of adding new interactions to the Interactions dataset. There are some instances where this does not happen although the degree of underreporting has never been estimated. It’s also not known whether there is any pattern to this under-reporting (for example, if interactions for businesses in one region of the UK were logged less than another).
4.3.7 - Source data URL
The Interactions dataset is deemed to be commercially sensitive and is therefore not publicly accessible. You can read some background information here: https://digitaltrade.blog.gov.uk/2024/07/30/surpassing-4-million-interactions-how-we-made-collecting-data-a-doddle/
4.3.8 - Data collection
The BI is manually input by colleagues across DBT into our CRM system, following an ‘interaction’ (meeting, call etc) with a business. The BI can cover issues mentioned by the business related to for example: their view on government announcements; current risks and opportunities they are experiencing.
The application of the subject tags to the BI is not in contravention to the original purposes for which the data was collected, which include supporting UK businesses by understanding the environment they are operating in and opportunities and barriers they are facing.
4.3.9 - Data cleaning
Much of the data collection is essentially manual transcription of oral interactions and therefore colleagues may apply their own ‘filtering’ to choose what BI is relevant for input into the CRM. For BI gathered via email interactions, colleagues may copy and paste the actual email or again apply their own summarisation. The extent of this manual filtering is not known, but it is expected colleagues will use their judgement to ensure all useful information is recorded, succintly.
Before training a new version of the model, automated pre-processing is done on the training data which includes: 1) remove instances of BI with fewer than 25 characters - this is an estimated cut-off below which BI is deemed uniformative; 2) drop the less frequently mentioned words in the BI (the top 10,000 most frequent words are retained); 3) remove punctuation, and lower case all text; 4) retain only the first 500 words of each BI instance. The latter processing step means that the model training time is kept to roughly a couple of hours. However it does mean that around 3% of BI instances are truncated, because their original text is longer than 500 words.
4.3.10 - Data sharing agreements
The Interactions data is only accessible to DBT staff and partners for the purposes of providing and monitoring support given to businesses.
4.3.11 - Data access and storage
The BI is stored on DBT’s in-house data platform, Data Workspace in a dataset called “Data Hub interactions”. For more information about Data Workspace see: https://dataingovernment.blog.gov.uk/2023/01/11/dits-data-workspace-all-our-data-in-one-place.
Any colleague in DBT who can access the platform can access the BI / interactions dataset by default.
The BI/interactions are deleted 10 years after the date on which the interaction took place.
The BI/interactions dataset contains personal data related to DBT staff and business contacts. This data most often includes names, contact details, role titles. No special category personal data is contained. A data protection impact assessment has been conducted for the storage of this data on Data Workspace.
Tier 2 - Risks, Mitigations and Impact Assessments
5.1 - Impact assessment
A data protection impact assessment for the modelling has not been carried out to date since the work is not considered to result in a high risk to individuals. Whilst the BI may contain personal data, such as the names of business leaders attending a meeting with the department, the algorithm does not involve the evaluation of characteristics of those individuals. Nor are the algorithmic outputs used to make decisions about any individuals.
An equality impact assessment has not been carried out to date. The algorithm is not used to make decisions about individuals and there is not thought to be a high risk of direct discrimination.
Data processing for this tool is within the scope of the DBT privacy policies published at https://www.gov.uk/government/organisations/department-for-business-and-trade/about/personal-information-charter
5.2 - Risks and mitigations
The risks covered below are a selection of those recommended for consideration by The Alan Turing Institute in a report prepared for DBT in 2023 and freely available online: https://www.turing.ac.uk/news/publications/process-based-governance-action
-
Risk that the software used to the build the model is no longer supported: low. The algorithm has been written using a very widely used programming language (Python) and well-supported open source packages.
-
Risk that the infrastructure on which the algorithm runs is no longer supported: low. The training of, and obtaining predictions from, the underlying model is performed on infrastructure supported at an organisation-wide level.
-
Risk of a lack of experts who understand how the algorithm works: medium. At the moment there is one employee who has in-depth knowledge of the algorithm and underlying model, including how to re-train the model at regular intervals. There is good internal documentation about how the model is trained, how predictions are obtained, and the underlying methodology. However there should be at least one other person who also has the detailed understanding such that they can regularly retrain the model.
-
Risk of ‘model drift’ i.e. the accuracy of the current tags worsens compared to the estimated accuracy during model training: medium. Because the model requires a training set, which requires human resource to construct, the retraining is done on a timescale of months. Therefore new subjects subsequent to model training will not get the appropriate label, and if the way the subject is discussed in the BI changes the accuracy of the tagging may also decrease.
-
Risk of a lack of oversight of model outputs: low. There is always a ‘human-in-the-loop’ in that domain experts check all of the tool outputs (the applied tags) before further use.
-
Risk that the underlying model is not understood by users: medium. The model is deep learning model and therefore inherently more challenging to understand than a logistic regression model for instance. However this has to be balanced with the superior performance of a CNN compared to a logistic regression. Furthermore, we currently choose not to use a generative text model which would further compound the issues around explainability. To mitigate the risk, the data science team give a plain English explanation (to the extent it can be given) to all new users - both in-person and written documentation.