Analysis Function RAP Strategy 2023 Implementation Plan at DfT
Updated 23 January 2024
Introduction
The Analysis Function Reproducible Analytical Pipelines Strategy was published in 2022. This strategy set out aims and actions for government analysts across 2022 to 2026 to create new products and re-develop existing ones in line with the principles of Reproducible Analytical Pipelines (RAP).
Reproducible Analytical Pipelines are automated analytical processes. They incorporate elements of software engineering best practice to ensure that the pipelines are reproducible, auditable, efficient, and high quality. When used appropriately within statistical and analytical processes, RAP delivers improved value, efficiency and quality.
The three key goals of the strategy to make this possible are:
- tools – ensure that analysts have the tools they need to implement RAP principles
- capability – give analysts the guidance, support and learning to be confident implementing RAP
- culture – create a culture of robust analysis where the RAP principles are the default for analysis, leaders engage with managing analysis as software, and users of analysis understand why this is important
This document sets out how the Department for Transport (DfT) will work towards implementation of the Analysis Function Reproducible Analytical Pipelines Strategy during the 2023 calendar year. We set out how DfT will respond to the requirements of the strategy to deliver the right tools, the right capability and the right culture and explain how we will measure progress towards delivery.
Current position in DfT
Throughout 2022, reproducible analysis and coding approaches have continued to be invested in across the analytical community within DfT. RAP is identified as an essential aspect of analytical approaches - improving flexibility, timeliness and quality of analysis. Our achievements through 2022 have focused on making it easier and more effective for analysts within DfT to apply RAP to their work, covering areas such as:
- making cloud-based coding tools and version control software available to all, replacing legacy systems which were difficult to use or outdated
- availability of a wide range of coding learning and development offers to ensure that analysts have the skills they need to understand and apply RAP
- analytical leaders understanding of the coding tools and approaches available and make effective decisions about how these can most efficiently be applied to team workplans
- analysts have been provided with tools to ensure RAP approaches are efficient, high quality and well documented, including custom coding packages, workshops and written best practice guidance
Progress in this area is rarely static, and many of our plans for 2023 continue to build on our previous achievements and successes to embed RAP approaches more thoroughly in our analytical work.
What does success look like for DfT in 2023?
- DfT will continue to embed sustainable RAP capability in analytical teams who are producing or supporting statistical outputs.
- New approaches towards RAP will be piloted within the statistics teams, ensuring that these meet the needs of those producing statistical outputs in the first instance.
- Analytical Leaders within statistics will promote a “RAP-first approach” and encourage analytical teams to further embed principals of reproducible analysis into all analytical processes.
- Managers within statistics will encourage their teams to continue to develop appropriate skills for RAP, and work with RAP champions to build upon existing capability and transform manual workflows.
- We will continue to reduce our use of manual processes, legacy systems and tools.
Summary at the start of 2023
The right tools
DfT has recently developed a cloud analysis platform which provides R, an integrated development environment (IDE) and Git version control integration. This provides the latest stable version of R and R Studio, CRAN mirroring using package manager, and easy access to these tools for any analyst who requests it.
Some legacy versions of R are also available for installation on desktop and cloud, but their use is no longer promoted or encouraged, and these will be gradually phased out.
Python is currently used through a number of different environments on the desktop and through cloud, with IDE availability and git integration varying.
Version control is supported both internally and externally with Github via an enterprise setup, meaning that all repositories are open source across the organisation by default. Support is provided for analysts wishing to open source their code externally, although this is currently done for a limited number of repositories.
Continuous integration is available through Github Actions, but implementation is minimal at this time.
The right capability
DfT are working to improve coding capability and have significantly increased their offering of learning and development opportunities across platforms including Git/Github and R, as well as platform-agnostic coding best practice around reproducibility, ease of reading and clarity.
Peer coding support opportunities are made available through our Coffee and Coding (C&C) network to allow analysts to request mentoring, peer review or paired coding support for projects.
Expert coding support is also provided by the Statistics Automation, Innovation and Dissemination (StatsAID) team, who offer resource and build capability in teams wanting to undertake RAP projects.
The right culture
DfT has a senior sponsor (Head of Profession for Statistics) for RAP Strategy implementation and a RAP Champions network to promote building capability, developing and maintaining RAPs across the analyst community.
Teams in DfT continually improve their statistics and data, in line with expectations of the Code of Practice for Statistics. As part of this, teams are encouraged to make time for projects relating to RAP, where possible.
Support for reproducible analysis, automation and coding capability is available from several points. Our C&C network provides a series of learning and development courses in coding topics. The StatsAID team builds capability and provides resource in projects which use coding to improve analytical quality, speed and reproducibility. Our RAP champions network share best practice in reproducible analysis and showcase ongoing projects in this area.
Appendix A – Detailed implementation plan
Tools
Analyst leaders will:
Action | 2023 activities | Status | Success criteria / metrics |
---|---|---|---|
Work with security and IT teams to give analysts access to the right tools | Ensure that the RAP MVP is supported on DfT computers and platforms | In progress | The tools required for RAP Minimum Viable Product (Appendix B) are available on recommended analytical platforms by the end of 2023 |
Ensure that all analysts have access to appropriate R and Python platforms | In progress | Clear pathways are available for analysts to obtain access to appropriate R and Python platforms which meet the RAP MVP standards | |
Streamline analyst access to version control software such as Github | In progress | Analysts can request Github Enterprise licences directly through their IT Focal Points for rapid access | |
Write DfT-wide coding guidance on which analytical tools to use and when | Not started | Central coding guidance is available to outline in plain language what coding tools are available and their appropriate usage | |
Ensure coding platforms continue to meet analytical user needs | In progress | Feedback process is documented for analysts to report issues and new feature requests. An appropriate process for reporting these to Digital and ensuring work is completed in a timely manner is in place | |
Work with security, IT, and data teams to make sure that the data analysts need are available in the right place and are easy to access | Engagement between analysts and DDaT colleagues as part of Transport Data plans | In progress | Data and analysis teams continue to feed into beta testing phase of Transport Data planning |
Analysts will:
Action | 2023 activities | Status | Success criteria / metrics |
---|---|---|---|
Use open-source tools where appropriate | Develop coding guidance which prioritises open source tools | Not started | Central coding guidance is available to outline in plain language the advantages of open source coding tools, their appropriate use, and availability within DfT |
Transform 5 analysis workflows to RAP workflows using open source tools | In progress | At least 5 existing analysis workflows are converted to using open source tools | |
Ensure existing guidance emphasises the importance of version control tools such as Git/Github | Not started | Where existing RAP guidance emphasises version control as a nice to have, or suggests version control methodologies other than Git/Github, update this to reflect a ‘Git as Default’ stance | |
Open source their code | Develop coding guidance on open sourcing code | Not started | Central coding guidance is available to explain the advantages and risks of open sourcing coding, how to open source code safely and effectively |
Offer Github Technical Lead training to ensure the capability to responsibly open source code | In progress | Github Technical Lead training is run at least once in 2023, and ongoing community support is available, to ensure the capability to responsibly open source | |
Work with data engineers and architects to make sure that source data are versioned and stored so that analysis can be reproduced | Engagement between analysts and DDaT colleagues as part of Transport Data plans | In progress | Data and analysis teams continue to feed into beta testing phase of Transport Data planning |
Capability
Analyst leaders will:
Action | 2023 activities | Status | Success criteria / metrics |
---|---|---|---|
Ensure their analysts build RAP learning and development time into work plans | StatsAID team to develop tools and/or guidance to record efficiency and quality improvement as an outcome of RAP products. | Not started | Tools and guidance in place to help analyst leaders record metrics around RAP efficiency and quality, and use these figures with confidence in work planning |
Encourage analysts to devote learning and development time to developing essential RAP and/or coding skills | Not started | Guidance on minimum essential RAP and coding skills are developed for analysts | |
Help their teams to work with DDaT professionals to share knowledge. | DDaT and statistician/data analyst forums to continue | In progress | Monthly analyst/digital forums continue to be held in 2023 to ensure collaboration between divisions |
C&C activities are open to both DDaT and Analysts | Complete | C&C distribution lists include Digital colleagues for both sharing invitations and for calls for presentations |
Analyst managers will:
Action | 2023 activities | Status | Success criteria / metrics |
---|---|---|---|
Build extra time into projects to adopt new skills and practices where appropriate | Will engage with supporting teams where appropriate to ensure capability building in new coding/RAP projects | In progress | Statistical team managers have a good understanding of the resources available from the StatsAID team to facilitate capability building in RAP projects |
Learn the skills they need to manage software | C&C will develop training for analyst managers to ensure they have an understanding of key analytical tools | Not started | Training for analyst managers is developed. This training is run at least once in 2023 to give an understanding of key analytical tools (R, Github). At least 75% of statistics team analyst managers report having attended the session, and feel more confident in their ability to manage software |
The DfT RAP community will:
Action | 2023 activities | Status | Success criteria / metrics |
---|---|---|---|
Deliver mentoring and peer review schemes in their organisation and share good practice across government | Implement a code reviewing network across DfT | Not started | A code reviewing network has been established and analysts are aware of its existence and function |
Offer coding review training opportunities | In progress | C&C have made coding reviewing workshops and/or other training available to analysts in 2023 |
Analysts will:
Action | 2023 activities | Status | Success criteria / metrics |
---|---|---|---|
Learn the skills they need to implement RAP principles | C&C continue to offer a range of learning opportunities, including formats that allow analysts to access training on demand | In progress | C&C will run learning opportunities on an at least monthly basis throughout 2023. At least 75% of these learning opportunities will later be available on demand through sharing resources and recordings of sessions. Attendance at these sessions will be monitored to ensure they meet community needs |
Develop suggested training program for analysts to undertake before starting RAP projects | Not started | C&C will record a list of existing training resources applicable to new RAP projects for both beginner and refresher levels |
Culture
DfT will:
Action | 2023 activities | Status | Success criteria / metrics |
---|---|---|---|
Choose leaders responsible for promoting RAP and monitoring progress towards this strategy within organisations | Our senior sponsor for delivering Reproducible Analytical Pipelines is Gemma Brand, Head of Profession for Statistics | Completed | Senior leader responsible for RAP chosen |
Form multidisciplinary teams that have the skills to make great analytical products, with some members specialised in developing analysis as software | No activities planned for 2023 | Not started | No activities planned for 2023 |
DfT Analyst leaders will:
Action | 2023 activities | Status | Success criteria / metrics |
---|---|---|---|
Promote a ‘RAP by default’ approach for all appropriate analysis | DfT senior leaders understand the importance of ‘RAP by default’ for their teams | In progress | Conversations held with DfT analytical senior leaders to gauge their existing knowledge of RAP and its use and benefits. Support and guidance is developed to address any common misunderstandings |
Develop RAP for adhoc analysis guidance | Not started | Central coding guidance is available to outline the utility of RAP approaches in adhoc analysis, and clear explanations of which aspects of RAP are appropriate for differing types of analysis | |
Write and implement strategic plans to develop new analyses with RAP principles, and to redevelop existing products with RAP principles | No activities planned for 2023. | Not started | No activities planned for 2023 |
Lead their RAP champions to advise analysis teams on how to implement RAP | Ensure that all DfT divisions have a nominated local RAP champion | In progress | This approach will be piloted in statistics, with all divisions having at least one local RAP champion, and members of that division are aware of them. Other professions will be encouraged to nominate a local RAP champion too |
help teams to incorporate RAP development into workplans | The StatsAID team will provide central mentoring support and guidance for teams wanting to incorporate RAP into workplans | In progress | A central mentoring, support and guidance offer is in place and teams are aware of this offer |
Support teams to make use of Github features such as labels, project boards and teams to monitor, explore and promote ongoing and complete RAP projects across the department | Not started | The RAP champions will develop and promote guidelines and processes for using Github as a monitoring, showcasing and prioritisation tool for RAP projects | |
Identify the most valuable projects by looking at how much capability the team already has and how risky and time-consuming the existing process is | Develop prioritising RAP guidelines for analytical leaders | Not started | Central coding guidance is available to outline key considerations when prioritising RAP projects and analytical leaders are aware of these |
DfT RAP champions will:
Action | 2023 activities | Status | Success criteria / metrics |
---|---|---|---|
Support leaders in their organisation in delivering this strategy by acting as mentors, advocates and reviewers | No activities planned for 2023 | Not started | No activities planned for 2023. |
Manage peer review schemes in their organisation to facilitate mutual learning and quality assurance | Implement a code reviewing network across DfT | Not started | A code reviewing network has been established and analysts are aware of its existence and function |
DfT Analyst managers will:
Action | 2023 activities | Status | Success criteria / metrics |
---|---|---|---|
Evaluate RAP projects within organisations to understand and demonstrate the benefits of RAP | Develop prioritising RAP guidelines for analytical leaders | Not started | Central coding guidance is available to outline key considerations when prioritising RAP projects and analytical leaders are aware of these |
Mandate their teams use RAP principles whenever possible | Analytical managers will ensure that they are aware of best practice resources available within DfT and will promote them to their teams | Not started | Analytical managers and analysts within statistics are aware of the location and contents of all best practice resources |
DfT Analysts will:
Action | 2023 activities | Status | Success criteria / metrics |
---|---|---|---|
Engage with users of their analysis to demonstrate the value of RAP principles and build motivation for development | Determine most appropriate way to engage with users about RAP | Not started | Statistical Dissemination team to decide on and publicise most appropriate way to engage with users on this (for example, the Transport Statistics Users Group (TSUG), Twitter) |
deliver their analysis using RAP | Will contribute to at least 5 RAP projects in 2023 | In progress | 5 RAP projects successfully completed in 2023 |
Appendix B – Assessment of tools at DfT, December 2022
For Reproducible Analytical Pipelines that meet the minimum criteria:
Tools | Comment | Status |
---|---|---|
Version control software, that is, Git | Git is available for both Cloud and desktop instances of coding platforms for all analysts, and is integrated with Github | Met |
Open-source programming languages and flexibility to add more (Python, R, Julia, JavaScript, C++, Java/Scala and so on) | R is available for analysts to use via a cloud analysis platform which provides an integrated development environment and Git version control integration. Python is available through a number of different environments on the desktop and through cloud, with IDE availability and Git integration varying. For other languages, there is the potential to add support, but this would be dependent on engagement with Digital colleagues and establishing an analytical need for this | Partial |
Package and environment managers for each of the available languages | Python and R both have toolchains for managing environments and packages (for example, pip and renv) | Met |
Packages and libraries for open-source programming languages, either through direct access to well-known libraries, for example, npm, PyPI, CRAN, or through a proxy repository system, for example, Artifactory | Download of packages for R is enabled through a mirror repository of CRAN through package manager. Library download for Python is currently challenging on some installations | Partial |
Individual storage, for example, home directory | Available on local desktop and in home directory of cloud coding platform | Met |
Shared storage, for example, cloud file and/or object storage, with fine-grained access control, accessible programmatically | Shared drives are available on local computers and cloud coding platform. These have secure access control | Met |
Integrated development environments suitable for the available languages; RStudio for R, Visual Studio Code for Python and so on | RStudio, Visual Studio Code and Jupyter notebooks are available in DfT | Met |
For further development of Reproducible Analytical Pipelines:
Tools | Comment | Status |
---|---|---|
Source control platforms, for example, GitHub, GitLab or BitBucket | Internal and external Github repository integration is available on DfT desktop and cloud coding platforms | Met |
Continuous integration tools, for example, GitHub Actions, GitLab CI, Travis CI, Jenkins, Concourse | Github Actions are enabled on all Github repositories for CI | Met |
Make-like tools for reproducible workflows, for example, make | Not currently available and no plans to make available as not currently required | Unmet |
Relational database management software, for example, PostgreSQL, that is available to users | GCP-based tools can be used to produce and manage relational databases, however normal use of these will be by centralised data engineering teams on behalf of data and analysis teams where data storage is required for transactional rather than analytical purposes | Met |
Orchestration systems for pipelines and workflows, for example, airflow, NiFi | GCP-based tools (Airflow, Cloud Fusion) can be used pipeline data, however normal use of these will be by centralised data engineering teams on behalf of data and analysis teams | Met |
Internal-facing servers to host html-rendered documentation | HTML-rendered documents can be hosted on a number of platforms including rsconnect, Github pages and Google Cloud, depending on the use case | Met |
External-facing servers with authentication to host end-products such as web applications or APIs | Dependent on the type of web application and underlying data; AppEngine and Cloud Run are available as GCP components. Any plans in this space would be done in collaboration with DIS colleagues to ensure appropriate hosting, infrastructure configuration and Identity and Access Management (authentication), as well as ensuring appropriate API management using Apigee and/or API gateway in GCP | Met |
Big data tool, for example, Presto or Athena, Spark, dask and so on, or access to large memory capability | BigQuery is the primary big data tool available, this is accessible to all analysts via GCP and can be used over data held in BigQuery tables or against files held in cloud storage, as well as significant SQL based ML functionality (BigQuery ML) and integration to other tools, for example Vertex AI and various Google AI APIs | Met |
Reproducible infrastructure and containers, for example, docker | Terraform (infrastructure as code) is already used by Cloud Engineering colleagues to manage digital infrastructure in a reproducible way. Additionally, Cloud Run and Google Kubernetes Engine (GKE) in certain are well used components for containerised application deployment. Any plans in this space would be made in collaboration with DDaT and DIS colleagues to ensure design was fit for analytical purpose | Met |
Appendix C – list RAP projects taking place at DfT during 2023
Project | Description |
---|---|
Road traffic ATC data ingest | Automated process to collect and transform near real-time ATC data from contractors FTP server to BigQuery, using Cloud Run and associated services |
National Travel Survey | Automate creation of accessible publication tables |
Congestion and Road Safety statistics | Transfer of data ingest, analysis and some quality checks to BigQuery as part of GCP transfer beta project |
Taxi and Light Rail statistics | End-to-end automation of data ingest, validation and analysis process in R, and implementation of version and quality control of code in GitHub |
National Highways and Transport (NHT) survey | Automation of data ingest and analysis |
Active Travel statistics | A program of coding improvements including production of accessible tables, HTML bulletin content, and analysis of NTS data in R, pipelining of data in GCP, and version and quality control of code in Github |
Rail Statistics | Completing a large RAP project in R to automate data preparation, quality assurance, and visualisations of data used in annual Rail Passenger Numbers publication |
Aviation statistics | Modernisation of existing coded processes, including migration and refactoring of R code, implementation of version and quality control of code in GitHub, and improvements to data storage and processing using GCP and R |
Road Traffic statistics | Merging existing daily and quarterly processes into a single coded process for cleaning and aggregating data. This will include development of SQL and R/R Shiny code to replace all existing processes, and make use of Github version and quality control |
National Highways real-time data project | Building on previous success of data processing in BigQuery, further development work will refactor code to improve efficiency, data cleaning and coverage |
Port Freight Statistics | Automate data visualisations, release commentary and quality assurance of tables in the quarterly port freight publication. This will include developing new code in both SQL and R, to improve the timeliness and quality of data checks and release production |
Shipping Fleet Statistics | Automate data visualisations and release commentary in the shipping fleet publication |
People Analytics | Continuing to improve processes across data storage, analysis and publication. This includes moving data storage from legacy Access and Excel-based systems into GCP, further developing code-based solutions for analysis, and publication of data in accessible ODS and HTML formats |