Ministry of Justice: Data First
Data First is a pioneering data-linking, research and academic engagement programme led by the Ministry of Justice and funded by ADR UK.
Data First unlocks the potential of the wealth of data created by the Ministry of Justice (MOJ) by making linked administrative datasets from across the justice system available for research. The programme is led by MOJ and funded by Administrative Data Research UK (ADR UK), an investment by the Economic and Social Research Council (ESRC).
Data from the courts, prison and probation services in England and Wales have been linked to enable new and innovative analysis of user journeys, interactions, and outcomes across the justice system. The programme is also enhancing the linking of justice data with other government departments, including education data from the Department for Education’s (DfE) National Pupil Database (NPD).
Data First enables researchers across government and academia to access these datasets in an ethical and responsible way via secure platforms in the ONS Secure Research Service and SAIL Databank.
By working in partnership with academic experts to facilitate and promote research in line with evidence priorities set out in the MOJ Areas of Research Interest (ARI) Data First is generating new insights to inform the development of government policy and drive real progress in improving justice outcomes.
General programme information
The Data First user guide provides further information about the programme, including the processes for accessing the data for research. The privacy and data protection statement provides information about how we use and share data.
Datasets
Data catalogues are available for all Data First datasets, providing information on the variables contained within each. These data catalogues are currently draft versions that provide basic details of each dataset and will be updated soon with final versions.
Data First has shared six datasets from administrative sources across the courts, prison and probation services in England and Wales: magistrates’ courts, the Crown Court, prisoner custodial journeys, probation services, and the family and civil courts.
The cross-justice system linking dataset can be used to join these six different datasets at a person level. This linking dataset also contains a table which can be used to join magistrates’ courts and Crown Court data at a case level.
Separately, data on criminal histories from the Police National Computer (PNC) have been linked to education and social care data in England from the DfE NPD as part of the MOJ-DfE data share. Please contact DataLinkingTeam@justice.gov.uk or data.sharing@education.gov.uk for the latest available metadata for the MOJ-DfE data share.
MOJ cross-justice system datasets
- Data First civil court data catalogue (coming soon)
- Data First offender assessment data catalogue (coming soon)
Limitations of data linking
Users should note that the accuracy of data linking is determined by the availability of personal identifying information in source data. All available identifiers have been used during the matching process, but the availability of demographic information varies by dataset. The Data First criminal justice datasets (magistrates’ courts, Crown Court, prisoner custodial journey and probation) contain numerous, well-populated identifiers. The family and civil court datasets however are less well-populated. This impacts on how linkage occurs and will need to be considered as part of designing research projects. Further details can be found within the user guide and relevant data catalogues, and researchers are welcome to contact us to discuss project ideas at their earliest convenience so we can advise on viability.
Applying for data access
Data First datasets can be accessed through the ONS Secure Research Service (SRS) or SAIL Databank (except for the MOJ-DfE data share, which is only available through the ONS SRS).
Requests to access data through the ONS SRS require completion of the Secure Access to Data Form, which can be accessed here along with additional supporting documents.
- For access to Data First datasets (covering courts, prisoner custodial journey and probation datasets), the application form should be submitted to datafirst@justice.gov.uk
- Applications for the MOJ-DfE dataset should be directed to DataLinkingTeam@justice.gov.uk and data.sharing@education.gov.uk.
- For all MOJ justice datasets (except the MOJ-DfE dataset) applicants will also need to complete the Research Project Application form, which will be assessed by the UK Statistics Authority Research Accreditation Panel (RAP).
Guidance for completing the application form can be found in the Data Sharing Guidance, and the list of datasets and access routes can be found here. Further information on the process overall is included within the Data First user guide above.
To access data within the SAIL Databank please apply though SAIL.
A register of external research projects which have been approved to use MOJ data is available to view here.
Analytical outputs
Statistical and social research publications using Data First data have been delivered by MOJ analysts or in collaboration with other government departments. Outputs have also been produced by ADR UK-funded Research Fellows. These publications can be found below:
- Data First: Criminal Courts Linked Data
- Education, children’s social care and offending
- Criminal courts research fellows
- Family court – Cafcass research fellows
- Probation and criminal justice system research fellows
- MOJ-DfE research fellows
Splink: Data linkage at scale
Through Data First, MOJ has developed a free and open-source software library to enable data linkage at scale. This software has been used to link some of the largest datasets held by MOJ as part of Data First.
Splink is a freely available, open-source Python package that is:
- faster and more accurate than other free tools
- able to link large datasets, of tens of millions of records or more
- developed with advice from academic experts in data linkage
- able to produce a wide range of interactive data visualisations that help to build effective models, explain linkage predictions, diagnose problems and quality assure models
- compatible with multiple databases and big data processing engines, meaning it can run on a wider range of computer systems
You can find out more on the Splink website, where you can download and start using Splink. You can also ask us a question or raise an issue on the public GitHub repository. Splink are happy to hear from researchers interested in using the software for their work.
Awards and Recognition
- Innovation in Methods, Analysis in Government Awards 2020, Splink: Probabilistic Data Linkage at Scale
- Innovative Methods, Analysis in Government Awards 2022, the Data Linking team (runner-up)
- Linked Administrative Data Award, ONS Research Excellence Awards 2022, ‘Data First: Criminal Courts Linked Data’
- Collaboration Award, Analysis in Government Awards 2024, Data First Team
Contact
Contact the Data First team at datafirst@justice.gov.uk if you would like further information or have any queries.
Updates to this page
Published 30 June 2020Last updated 17 September 2024 + show all updates
-
An explanatory note has been added to two variables in the prisoner custodial journey data catalogue. An additional minor change has been made to correct a typing error identified in the User Guide.
-
Added 'Data First research bulletin July 2024' to analytical outputs section.
-
Updated Family Court and Cross-Justice Linking data catalogues have been added, along with an updated User Guide
-
An additional section, 'Limitations of data linking', has been added to the main text.
-
Updated magistrates' courts data catalogue added. Outdated criminal courts and prisons linking data catalogue removed.
-
Updating data catalogues for magistrates' courts, Crown Court, prisoner custodial journey and probation datasets.
-
General user information has been updated to reflect new datasets and linkages. Updates to the User Guide and data catalogues will follow. The order of sections of the document has changed. New contact information has been added.
-
Splink information added.
-
Data First Family Court data catalogue updated.
-
Data First prisoner custodial journey data catalogue updated.
-
Analytical outputs section added.
-
User guide updated and Data First probation data catalogue, Data First criminal courts, prisons and probation linking data catalogue published.
-
User guide updated and Data First Family Court data catalogue published.
-
User guide, privacy statement, Data First magistrates' court defendant data catalogue, Data First Crown Court defendant data catalogue and Data First criminal courts and prisons linking data catalogue updated.
-
User guide updated and Data First prisoner custodial journey data catalogue published.
-
User guide updated and Data First linked magistrates’ and Crown Court data catalogue published.
-
Documents updated and Data First Crown Court defendant data catalogue published.
-
First published.