Ealing Council: Adult Social Care Annual Financial Expenditure Forecast
A machine learning model to forecast annual adult social care expenditure by cost centre.
Tier 1 Information
###Name
Adult Social Care Annual Financial Expenditure Forecast
Description
To provide an accurate annual prediction from April each year of the next 12 months’ expenditure. This includes a service predicted starter/leaver model. The tool creates a forecast by high level cost centre and is then aggregated to an overall service expenditure forecast. External information not available through social care case management cost recording (e.g. external grants) are included in the tool calculations. The objective is to better allocate council funds in the budget round.
Website URL
N/A
Contact email
greenhamp@ealing.gov.uk (HoS)
squiresm@ealing.gov.uk (Applications Manager)
Tier 2 - Owner and Responsibility
###1.1 - Organisation or department
Ealing Council Finance Department, Adults Social Care Accountancy
1.2 - Team
Accountancy - Shared Resources Ealing Council
1.3 - Senior responsible owner
Strategic Director (Resources)
1.4 - External supplier involvement
Yes
1.4.1 - External supplier
Info Insight Ltd
1.4.2 - Companies House Number
10857040
1.4.3 - External supplier role
Provision of data science development expertise
1.4.4 - Procurement procedure type
Restricted
1.4.5 - Data access terms
Anonymised data provided to supplier for model development. Secure DMZ environment provided for model build with no access to export tools outside of London Borough of Ealing.
Tier 2 - Description and Rationale
###2.1 - Detailed description
Base data is drawn from the standard B13 commissioning report in Mosaic Social Care Case Management system, using SQL connectors. Data is then exported back out of the DMZ to the source server and inserted into an SQL table. It is then made available both native to Mosaic reporting and is also connectable via PowerBI.
Data is cleaned using SSIS/Visual Studio in a standard pipeline internally. A target file is prepared, which is then loaded to a modelling/processing server in the DMZ. The data science server then pushes out another cleaning/preparation pipeline based in R/Python.
The tool has three main components: time-series cost models, an accurate starter/leaver model, and system-wide expenditures.
-
Cost models: four identical models are provided, one per key headline cost centre. These are aggregated into an overall cost to service. The reason for this is for finance colleagues to track individual budgets separately but provide overall consolidated spend profiles to the council. Predictions about the future cost of long term care are calculated using the Long Term Social Care Risk Predictions machine learning model.
-
Starter/leaver model: the starter/leaver model is consolidated across all leaving reasons (mortality, geographic, choice, financial etc) and does not discern specific “why’s”. It is anonymised at that level and cannot make individual predictions.
-
System-wide expenditures: data external to the case management system is applied at the end of the modelling pipeline in the form of raw 1-off expenditures and provisions such as grants.
2.2 - Scope
Currently as a 1-off annual budget forecast. The first iteration was actually provided to include data from Q1-3 and its Q4 out-turn was accurate to £47K in £112M. The full annual forecast for 24/25 is currently being refreshed. Note this is a 1-off forecast and is not acting as a monthly or quarterly monitor.
2.3 - Benefit
Better allocation of budget across the year. Intelligence on if additional provision or reduction might be expected compared to the previous year’s, plus growth/reduction advice.
Councils have an obligation to manage public money to produce good value. As part of that, forecasting is a standard finance task to support the services in the management of their budget. The more accurate a forecast can be, then the easier it is for services to allocate budgets and recognise when additional measures to stay inside budgets are required.
This data science approach is essentially an automation of an existing process (same base data) and a method of increasing the accuracy of annual forecasting. The target is for actual vs predicted to be brought down to within 1% of each other if possible.
Once the finance service has confidence in model accuracy and repeatability, we can take the decision to deprecate existing processes and replace with this machine learning automation
2.4 - Previous process
Finance had their own forecast based on experience of previous years and with professional input on VAT, cost rises etc. However, whilst heavily reviewed and allowing for any committed data, this has often been out by amounts for the order of tens of millions by the close of year.
2.5 - Alternatives considered
N/A
Tier 2 - Decision making Process
###3.1 - Process integration
Part of annual budget setting and advice to Senior Leadership Team at start of year.
3.2 - Provided information
Excel reports broken by cost centre. PowerBI visualisation of the same. The leadership and finance team are still in the process of deciding exactly how to use and integrate the reporting suites into service design.
3.3 - Frequency and scale of usage
The tool can used annually to predict the next 12 months spend prediction. However, we are strongly suggesting the utility of expanding the tool to provide at least quarterly refresh. This would allow London Borough of Ealing to monitor changes in forecast resulting from responses to the budget situation.
3.4 - Human decisions and review
The model is checked at refresh time to ensure the predicted results match intuitive expectations for the next year, and may then be remodelled to include additional data relevant to those expectations. This could include the following local knowledge: large scale care package re-buys at new rates, further demographic modelling, expected grants etc.
3.5 - Required training
None. Delivery is in simple Excel and SQL formats.
3.6 - Appeals and review
N/A
Tier 2 - Tool Specification
###4.1.1 - System architecture
Due to Ealing Council’s Cybersecurity policies, we are unable to share information about system architecture.
4.1.2 - Phase
Production
4.1.3 - Maintenance
Annually (see Human Decisions and Review)
4.1.4 - Models
Machine learning classification model trained using gradient boosted trees algorithm, lightGBM. The model predicts the spend profile across the relevant budget areas over the April to March financial year.
Tier 2 - Model Specification
###4.2.1 - Model name
Adult Social Care Annual Financial Expenditure Forecast
4.2.2 - Model version
v2
4.2.3 - Model task
The model predicts the expected spend profile for 4 adults key service areas over the next 12 months from April each year.
4.2.4 - Model input
Structured social care activity and cost data from Mosaic
4.2.5 - Model output
Percentage risk spend profile over 12 month period, accounting for expected numbers of starter/leavers per package type over that period.
4.2.6 - Model architecture
Gradient boosted trees model using lightGBM (GBDT) implementation.
All key Gradient Boost Decision Tree model parameters are optimised using randomised parameter grid search.
4.2.7 - Model performance
Test data performance: AUC: 0.91 Accuracy: 83% Balanced Accuracy: 54%
4.2.8 - Datasets
All data comes from Ealing ASC Mosaic server - client demographic information collected by ASC front-door (or already known to the Council): age, sex, latitude, longitude of home address (or postcode), IMD deprivation scores.
- Consistent client ID numbers that enable tracking any Council-related expenditure and service provision that was made available to the client in the last five years (more years if available).
- History of ASC or other council services accessed by the client (with duration, including start-end dates) and related costs (if available).
- Client care group or any other categorisations of the client’s needs that were made by the front-door service.
- Service start/end dates.
- Clear labelling of which clients ended up with requiring long-term and consistent care packages.
4.2.9 - Dataset purposes
All data was extracted using a single SQL query and structured into a single flat data table that was loaded onto a Python environment. After running data transformation in Python, the single data table was split into train/test/validation based. The split used Client IDs that were randomly allocated to train/test/split.
Tier 2 - Data Specification
###4.3.1 - Source data name
Mosaic Case Management Care Package detail Mosaic Budget Reference Data B13 Commitments dataset
4.3.2 - Data modality
Tabular
4.3.3 - Data description
The data fields required are at client level. Specific datums requested are: - Client demographic information collected by ASC front-door (or already known to the Council): age, sex, latitude, longitude of home address (or postcode), IMD deprivation scores. - Consistent client ID numbers that enable tracking any Council-related expenditure and service provision that was made available to the client in the last five years (more years if available). - History of ASC or other council services accessed by the client (with duration, including start-end dates) and related costs (if available). - Client care group or any other categorisations of the client’s needs that were made by the front-door service. - Service start/end dates. - Clear labelling of which clients ended up with requiring long-term and consistent care packages. - Optional: any other relevant information, e.g. health conditions, frailty index, client debt.
4.3.4 - Data quantities
Variable but < 1Gb of data Train: 80% Test: 20% Validation: 10% of Train dataset
4.3.5 - Sensitive attributes
Data is anonymised, but includes age/Date Of Birth (DOB), service user group, ethnicity, gender, care package value.
4.3.6 - Data completeness and representativeness
By point of providing a care package in the process for the client, the key attributes are fully known.
4.3.7 - Source data URL
N/A
4.3.8 - Data collection
Standard social care case management. Commissioning of proposed care packages recorded and arrangement of start/end dates. Care can be provided as residential or non-residential.
4.3.9 - Data cleaning
Data for required time period extracted from Social Care DataWarehouse, MOSAIC and GIS by SSIS then anonymised before being passed to ASC server and imported into SQL database.
Some missing demographic info is interpolated where possible (e.g. age recorded as median age etc.)
4.3.10 - Data sharing agreements
No additional Data Sharing Agreements (DSAs) required. Finance have access to this data in the current model. Objective is to refine and correct the model with ML.
4.3.11 - Data access and storage
ICT Social Care support - access to care package and commitments data on on-going basis via case management BI team - provide a social care data warehouse and have on-going access for monitoring purposes. External data scientist has access to data for duration of the modelling engagement. Internal teams have access to case management data on a supplier hosted site. Overall access is by secure VPN to hosted app. Internally data warehouse is inside standard Ealing security in a data centre. Data science server is in a DMZ in that same data centre. That server has no outside access other than a single secure SMB share where data is exchanged with data warehouse. This includes removal of any unnecessary MS-Office tools and in particular email/ftp etc.
Tier 2 - Risks, Mitigations and Impact Assessments
###5.1 - Impact assessment
Internal DPIA completed: ASC_Predictives_LTCAndForeCasting.docx 22/12/2023 Councils have an obligation to manage public money to produce good value. As part of that, forecasting is a standard finance task to support the services in the management of their budget. The more accurate a forecast can be, then the easier it is for services to allocate budgets and recognise when additional measures to stay inside budgets are required. This data science approach is essentially an automation of an existing process (same base data) and a method of increasing the accuracy of annual forecasting. The target is for actual vs predicted to be brought down to within 1% of each other if possible. Data quality is an existing function of Adults Records Management. Adults Commissioning and Social Care support team and will not need to be adjusted in this work. The data science resource (processor) is only supplied anonymised data
Client level data is not a consumable out-turn from this model although the DS model analyses at client level before aggregating expenditure at service (cost-centre) level.
5.2 - Risks and mitigations
Skill resource - availability: available from contract market or M/Soft gold partners. Infrastructure loss - Server is part of data centre VM infrastructure - spin up new server or failover. Retention of cohort data - Build oldest annual data layer replacement of no more than 5 years Or Full refresh of 5 years data each year. Authorised access to data - Manage via starter/leaver process and Active Directory monitoring. Material change in model may be required later to account for exceptional events influencing costs and starter/leaver model such as COVID.