West Midlands Police: exploratory analysis of sexual convictions
This algorithmic tool helps to ascertain the potential effects of a number of factors (e.g. the number of officers allocated to an investigation) on the success or failure of RASSO (rape and serious sexual offences) investigations.
Tier 1 – Overview
Name
Exploratory Analysis of Sexual Convictions
Description
We have undertaken statistical modelling in order to isolate the effects of a large number of different potential factors (within statistical modelling these are features or variables; see the Data section (4.1)) on RASSO (rape and serious sexual offences) investigations. The aim was to ascertain the potential effects of some of these factors with a view to inform resource allocation to RASSO investigations. (It should be noted that ‘factors’ here means ‘items’ and is not meant in the statistical sense of the term.)
This is an explanatory analysis. An explanatory analysis aims to see how much the different features contribute to (in this case) success or failure. It is not built to make predictions.
As this is an explanatory analysis, the outputs were used to highlight some new questions that can inform potential decisions around which additional variables we should collect data on (e.g. why victims withdraw support for an investigation). It also helped inform how cases can be allocated to investigators by way of examining the effects of the different features (e.g. the number of officers allocated to an investigation) on the likelihood of an investigation resulting in a charge. This contributes to a static type of decision making, meaning that it informs a decision at one point in time (e.g. which additional variables to collect data on), but it does not inform decision making on a continual basis.
The number of RASSO investigations coming into WMP’s Public Protection Unit (PPU, the department that undertakes RASSO investigations) has increased substantially over the last 5 years and at the same time the successful conclusion rate for such investigations has dropped considerably.
The project therefore aims to identify potential avenues to enable a more effective use of resources and thus contribute to a higher successful conclusion rate.
There are many variables to account for in RASSO investigations so statistical analyses have been used to enable an estimate as to how much each of the different variables contributes to the success or failure of an investigation.
URL of the website
The results of the analysis have been published on the West Midlands Police and Crime Commissioner’s website.
The report is available within the January 2020 meeting’s files.
Contact email
N/A
Tier 2 – Owner and Responsibility
1.1 Organisation/ department
West Midlands Police
1.2 Team
Data Analytics Lab (for building)
1.3 Senior responsible owner
N/A
1.4 Supplier or developer of the algorithmic tool
N/A - the tool was developed internally
1.5 External supplier identifier
N/A
1.6 External supplier role
N/A
1.7 Terms of access to data for external supplier
N/A
Tier 2 – Description
2.1 Scope
The analyses were designed as one-off explanatory analyses to identify means by which successful outcomes in RASSO investigations could be improved via the use of investigatory resources.
It was not designed as a triage tool or as a means of predicting the outcome of RASSO investigations.
2.2 Benefit
The project highlighted different rates of success in cases involving victims with different characteristics (predominantly age) which were not previously known to PPU who have investigated further.
The project identified that there was the potential to improve successful outcomes for RASSO investigations by circa 30% without increasing the number of officers.
2.3 Alternatives considered
Due to the nature of the task, other approaches, such as qualitative analysis, would not have been able to tease apart the contribution of the different applicable variables and so would have been unable to identify potential improvements in the way investigatory resources can be used to increase the success of the investigations.
2.4 Type of model
In order to ensure a robust assessment of the findings, four primary methods were used in the analysis. These different analytical approaches allowed us to triangulate the findings, meaning that the findings from the different methods were qualitatively the same. The following methods were used in parallel to validate the findings:
- Relaxed LASSO (logistic regression)
- Bayesian regression with regularising priors (logistic regression)
- Directed Acyclic Graph
- Ensemble method (gradient boosting machine) – this method was used only to find the importance ranking for variables as a check on the relative size of coefficients arising from the previous methods. It was not used for prediction.
It should be noted that some of the variables contained in the models were transformed using splines in light of non-linearities.
2.5 Frequency of usage
N/A – this was a one-off project.
2.6 Phase
This project is now ‘retired’ in that the findings were provided to the PPU department in February 2020 and informed decision making in a one-off instance.
2.7 Maintenance
N/A – this was a one-off project.
2.8 System architecture
N/A – this was a one-off project.
Tier 2 – Oversight
3.1 Process integration
The analyses fed into strategic decision making by way of providing insights as to how the investigation allocation process and victim engagement processes may be changed in order to improve successful outcome rates.
Because of this aim, the analyses are not integrated into any processes thereafter.
3.2 Provided information
A report detailing the findings was provided to the department (PPU).
3.3 Human decisions
Numerous discussions have been entered into with various levels (sergeant through to Chf. Supt) of subject matter experts in order to make sure that situations, processes, etc. were fully understood and to sense-check findings as the project progressed.
The analysis led to strategic options being considered for the allocation of cases to investigators and the enhancement of data collection processes. Any final decisions based on the report were taken by humans and were made within the PPU department.
3.4 Required training
N/A – this was a one-off project.
3.5 Appeals and review
N/A - no predictions or decisions about individual investigations arise from this project.
Tier 2 – Information on data
4.1 Source data name
Crimes data, the Command and Control system and Scenes of Crime datasets were used.
4.2 Source data
This list of variables used in the analysis (see Annex) was derived from the datasets detailed above.
4.3 Source data URL
N/A
4.4 Data collection
Data collected for normal Policing purposes, such as during investigations.
4.5 Data sharing agreements
N/A
4.6 Data access and storage
This data is used by WMP in normal day-to-day investigatory processes. The data is stored according to the Management of Police Information (MoPI) guidance.
Tier 2 – Risk mitigation and impact assessment
5.1 Impact assessment name
Data Protection Impact Assessment (DPIA) Algo-care framework Ethical assessment
5.2 Impact assessment description
A DPIA was completed by WMP’s Information Management department (DPIA ‘ Analysis of RASSO Investigations).
The algo-care framework was used during the ‘in-principle’ stage of assessment by the Ethics Committee. This is applied at the beginning of a project and taken to the Ethics Committee for any ‘in-principle’ concerns to be highlighted.
An ethical assessment was completed via WM PCC’s Ethics Committee.
5.3 Impact assessment date
DPIA - 03/06/2019
5.4 Impact assessment link
N/A
5.5 Risk name
- Data quality
- Analysis highlighting spurious relationships between features
5.6 Risk description
- Data quality – sometimes there can be data quality issues due to the nature of how data is inputted during investigations.
- The analysis could highlight spurious relationships between features. Spurious correlation is a potential risk inherent to any statistical modelling.
As this was an explanatory model and not a predictive model, people’s characteristics were used as controls rather than parts of ‘patterns’ identified by the model(s) and as such the issue of potential bias is not applicable in this instance.
5.7 Risk mitigation
-
Data quality – the project included an extensive exploratory data analysis phase which included an assessment and identification of any data quality issues and the ways in which any such issues could be mitigated for the purposes of the project.
-
Analysis highlighting spurious relationships between features – various methods were used to check for general agreement in the findings which helps to mitigate this possibility.
Annex - List of variables used in analysis
cuc category grouping | factor | Final Clearup Category of the Incident | The final outcome of investigations. |
---|---|---|---|
npu | factor | Neighbourhood Policing Unit. | Areas that WMP is split into. |
vsr flag | factor | Victim Support Requested | NA; N; Y |
dv risk | factor | Risk of Domestic Violence. High, Medium, Standard | NA; H; M; S |
report method desc | factor | FRONT OFFICE; HELP DESK/CONTACT CENTRE; PATROL; PPU; OTHER | |
offence type desc | factor | Other, Child Abuse; Domestic Abuse | |
victim sex | factor | FEMALE; MALE | |
has witness | logical | Y; N | |
offender known | factor | Undetermined; Known; Stranger | |
reported | factor | Same day, week, month, historic | Within 1 Day; 1 Week; 1 Month; 1 Year; 5 Years; Historic (> 5 years) |
ip age years | numeric | IP Age at the time of the offence | Mean 22.4, SD: 12.6, Median: 19.2 |
suspect age years | numeric | Suspect Age at the time of the offence | Mean 29.1, SD: 12.9, Median: 26.7 |
days b4 reporting | numeric | Days before crime was reported | Mean 1727, SD: 3914, Median: 10.1 |
days b4 soco | numeric | Days before Scene of Crime data was collect | Mean 39.3, SD: 101.3, Median: 4.8 |
days b4 finished | numeric | Days an Incident is Open | |
days b4 finished censored | numeric | Days an Incident is Open (+ Crimes that are still open) | |
hours b4 first investigation | numeric | Hours between reporting and first investigation note | |
suspect ethnic appearance | factor | WHITE; ASIAN; BLACK; NOT KNOWN; OTHER | |
ip (victim) ethnic appearance | factor | WHITE; ASIAN; BLACK; NOT KNOWN; OTHER | |
ip age group | factor | Grouping of the ip age in years | IP Age: 0 - 12: 2396, 13 - 16: 2721, 17, 18, 19: 1717, 20s: 3080, 30s: 1651, 40+: 1816 |
has soco | logical | Is there scene of crime data associated with this incident? | Y; N |
soco dna match | logical | Is there a dna match to a suspect? | Y; N |
soco swab | logical | Were swabs taken? | Y; N |
soco phone | logical | Is the phone of the IP or Suspect available? | Y; N |
soco cctv | logical | Is CCTV available? | Y; N |