Previous Data Science Accelerator projects
Updated 13 June 2024
To help you decide if your proposed project is a good fit for the Data Science Accelerator programme, we’ve listed a selection of previous projects.
If you have any questions about the suitability of your project for the programme, email data-science-accelerator@digital.cabinet-office.gov.uk.
Mapping vessel satellite and landings data
Department for Environment, Food and Rural Affairs
The problem it was trying to solve
The Marine Management Organisation wanted to make better use of its data to understand what areas could be marine protected.
Project objectives
To simplify the old system by cleaning, merging and analysing marine datasets.
Outcome of the project
An online tool was created that maps landing data by user-specified inputs. It allows users to query and visualise data. It has also reduced the number of requests to the Marine Management Organisation.
Data science methods used
The applicant:
- translated the original process from Microsoft Access into R (a data science programming language)
- created an interactive tool for users
- deployed the tool on a server using Amazon Web Services
Identifying ‘low value’ user tickets
Government Digital Service
The problem it was trying to solve
The GOV.UK Response Team wanted a way to categorise support requests.
Project objectives
Identify ‘low value’ tickets from the support requests.
Outcome of the project
The project produced a working model that used text classification to identify ‘low value’ tickets, with a recommendation of how it could be used in production.
The model visualised patterns within the topics of the user tickets. This enabled better identification of user requests.
Since completing the project, the participant has used the model’s code and the related techniques in their role.
Data science methods used
The applicant used:
-
Python (a programming language) and its additional packages including:
- Pandas
- JupyterLab
- Sklearn
- pyLDAvis
- spaCy
- Removal of PII
- Fast string matching techniques
Classifying businesses using text descriptions
HM Revenue & Customs
The problem it was trying to solve
HM Revenue & Customs uses trade labels to identify and group certain business types, however these trade labels are often missing or unreliable.
Project objectives
Improve the data quality and any work that relies on trade labels.
Outcome of the project
A classifier was developed that classified 85% of traders correctly across the largest trade classes.
HM Revenue & Customs showed an interest in turning this into a production-standard tool.
Data science methods used
The applicant:
- split each trade descriptions into words
- created a set of numbers with a count for each word
- used gradient-boosted decision trees to generate the trade class labels
Mapping marine habitat sensitivities
Joint Nature Conservation Committee
The problem it was trying to solve
The effects of human activity or natural events on marine habitats has been inconsistently mapped.
Project objectives
Provide consistency in understanding the pressure that human activities can cause on the marine environment.
Outcome of the project
The participant developed a tool that uses spatial mapping that helps inform marine managers about particularly vulnerable areas.
The Joint Nature Conservation Committee is developing this tool further to provide conservation advice for marine protected areas.
Data science methods used
The applicant used:
- R (a data science programming language), UK marine habitat maps and marine habitat sensitivity data to develop an automated method that aligns available sensitivity information to existing maps
- Shiny (a package in R) to display outputs
Predicting the presence of smoke alarms in domestic properties
Leicestershire Fire and Rescue Service
The problem it was trying to solve
How to prioritise home safety checks.
Project objectives
To understand what factors have the greatest influence in predicting ownership of smoke alarms. It also aimed to create a model which could be used to prioritise where fire-safety checks should be carried out
Outcome of the project
The plan is to use some of these techniques to build a model which we can be used to target risk reduction activity and monitor change.
Data science methods used
The applicant used:
- MySQL (an open-source relational database management system)
- R (a data science programming language)
- RStudio (a free and open-source integrated development environment for R)
- Dplyr (a package in R that transforms and summarises tabular data with rows and columns)
- Caret (a package in R used for predictive modeling and supervised learning)
- Random forest prediction model (a classification method)
- Geographic information system (a system designed to capture, store, manipulate, analyse, manage and present spatial or geographic data)
Assessing geospatial risk
Norfolk County Council
The problem it was trying to solve
Identifying where to allocate council services for greatest value.
Project objectives
Provide a way of using data of place-based services (such as libraries and children’s centres) proportionate to the population of Norfolk to inform decision making.
Outcome of the project
An interactive mapping tool was developed that explores the effect of service redesign proposals on the resident population. This has resulted in challenges to assumptions about what services can be delivered within current budgets.
Data science methods used
The applicant used:
- Leaflet and R Shiny
- a weighted indicator methodology to define population needs across a geographical area and travel time analysis
Using computer vision to identify vehicles in a CCTV feed
Transport for London, 2017
The problem it was trying to solve
Improve traffic management and road safety.
Project objectives
Provide real-time information for the ‘on-street’ situation across the road network.
Outcome of the project
A proof of concept was created to track and count vehicles as they moved through a CCTV video feed.
The initial design used blob detection and vector tracking and had an estimated accuracy of 85% in certain conditions.
Some preliminary work was done to investigate HAAR Cascades (a machine learning technique) to see whether this could alleviate some of the detection issues and improve overall accuracy.
The knowledge gained from the project has been used to better understand and define future requirements for computer vision technologies.
Work is being done to develop the HAAR Cascades as well as methods to detect and classify vehicles.
Data science methods used
The applicant created the proof of concept using Python, OpenCV (a computer vision library in Python) and the TfL JamCam API.
Automation of object detection from satellite imagery
UK Hydrographic Office, 2016
The problem it was trying to solve
Reduce the risk of collisions with offshore infrastructure such as oil rigs and wind turbine.
Project objectives
Improve knowledge of offshore infrastructure worldwide by automating the data capture process.
Outcome of the project
This work has been turned into a system that creates a geo-referenced dataset of labelled objects.
Data science methods used
The applicant:
- used synthetic aperture radar imagery to scan the earth’s surface
- used open source image processing libraries in Python (data science programming language) to process the radar images
- created a system that uses a blob detection algorithm to detect objects visible in the ocean on the satellite imagery