GOV.UK Platform as a Service (PaaS) live assessment
The report from Government Digital Services' GOV.UK Platform as a Service (PaaS) live assessment on 10/02/21
Digital Service Standard assessment report
GOV.UK Platform as a Service (PaaS)
From: | Government Digital Service |
Assessment date: | 10/02/21 |
Stage: | Live |
Result: | Met |
Service provider: | Government Digital Service |
Previous assessment reports
- Alpha assessment report: June 2016 - Met
Service description
GOV.UK PaaS is a cloud hosting solution for public sector digital services - a shared platform that service teams can use to quickly and safely host their applications in the cloud.
The PaaS team manages the infrastructure that teams need, so they can concentrate on building high quality services.
Service users
- GOV.UK PaaS is used by public sector teams - central government and arms length bodies, local authorities and emergency services
- PaaS currently has around 100 paid for tenant accounts and a similar amount of free trial accounts
-
example accounts and services that are hosted on GOV.UK PaaS include:
- Department for International Trade - great.gov.uk
- Cabinet Office and No 10 - Covid dashboards (restricted access)
- HMRC - Trade Tariff
- MHCLG - Energy Performance Certificates
- GDS - GOV.UK Notify and Digital Marketplace
- CAA - Drone Registration and Education
- Department for Education - Teaching Vacancies
- Coronavirus shielding support
- PaaS has different types of users (or tenants):
- delivery teams in a hurry to launch a new service
- Chief Technical Architects needing to consolidate many Infrastructure-as-a-Service platforms
1. Understand user needs
Decision
The service met point 1 of the Standard.
What the team has done well
The panel was impressed that:
- the team have continued to do research with a wide range of users to identify further needs and pain points in the user journey
- the team has evidenced through research a good understanding of their users, not just those building the service but also those maintaining and supporting it
- The team have made good efforts in ensuring research is inclusive of all users and iterations are evidenced by User Research
What the team needs to explore
Before their next review, the team needs to:
- to do ongoing research to get a deep understanding of why new tenants fail to convert from trial to paid arrangements
- continue to understand the needs of different suppliers, why some want to use the service and others don’t
2. Do ongoing user research
Decision
The service met point 2 of the Standard.
What the team has done well
The panel was impressed that:
- the service team have a good plan for ongoing research and plans to gather feedback from users for continual improvement of the service journey
What the team needs to explore
Before their next review, the team needs to:
- test the assisted digital support model with users that have limited access and digital capability
- ensure end to end journey is tested with users of accessibility needs
3. Have a multidisciplinary team
Decision
The service met point 3 of the Standard.
What the team has done well
The panel was impressed that:
- the team is very well set up in terms of skills, strong expertise and many passionate people who are all permanent civil servants. The same team will be moving to Live, which ensures continuity
- the team includes 10 engineers, all civil servants, demonstrating ability to overcome recruitment challenges in this area
- there is a very strong focus on knowledge sharing across disciplines, for example by having a weekly knowledge sharing ceremony. All team members also showed they have a strong empathy and understanding of users, which is reflected in the way the product has been developed and iterated
- they also demonstrated an emphasis on ensuring team health in a remote-first setting through a variety of ways
What the team needs to explore
Before their next review, the team needs to:
- ensure they have a dedicated designer available to the team on a part / full time basis. Currently, they use a mix of support from the GDS and cross-government design communities, but having some more formal and dedicated support in design will be important to ensure the product continues to meet its user needs, usability best practices and accessibility requirements
- fill any critical vacancies that they have identified as soon as possible in Live
- decrease the reliance on a single person for engagement. It seems that most of the engagement with services is done by the technical architect with some support from the product manager. One of the key areas to address and improve as a product for PaaS is adoption, in particular from smaller organisations and non-central government bodies. To do this effectively and at scale, dedicated engagement support will be necessary
- explore how they strengthen their formal collaboration with other parts of GDS that could support their engagement and growth efforts - for example, Standards and Assurance, and spend control
4. Use agile methods
Decision
The service met point 4 of the Standard.
What the team has done well
The panel was impressed that:
- the team has a really mature approach to agile and uses a range of best practices and techniques to deliver in an iterative, agile way. The team has very strong leadership that helps to coach and provide direction to everyone, but there is also strong emphasis on self-organising in terms of story kick-offs and incident reviews
- there is a really solid approach to how things are prioritised on their roadmap based on user needs, technical requirements and value / effort
- the team makes good effort to visualise that and make prioritisation decisions as a team, and they have a public roadmap on GitHub that users can also feed into
5. Iterate and improve frequently
Decision
The service met point 5 of the Standard.
What the team has done well
The panel was impressed that:
- the team’s example of developing a feature (secrets management) demonstrated a good process that responded to user needs. From identifying the need, the team researched how users currently solved the problem, before testing a range of prototypes and iterating them further before implementation
- the team presented a robust roadmap for Live, which shows their on-going plans to iterate and improve the product based on user research and customer feedback
- there is a big emphasis on tracking and analysis of user support tickets, and operationalising them in the product development
- the team iterates their ways of working to reflect lessons learned. For example, as a result of issues that arose in their retros, they have set up ways to record tech debt formally and hold a monthly forward look session on reliability issues
6. Evaluate tools and systems
Decision
The service met point 6 of the Standard.
What the team has done well
The panel was impressed that:
- the platform offers strong benefits to users hosting typical modern digital services, in areas including security, reliability and cost. The team clearly articulated the choice of continuing to base their platform on Cloud Foundry. Describing GOV.UK PaaS as an “opinionated platform”, they are aware of the trade-offs compared to cloud-managed offerings (for example EKS & AWS) and communicate what sorts of digital services do not fit
- they have demonstrated ability to deliver changes - over the past year they have added an administrative web interface, added more backing services including SQS, enabled autoscaling, now operate across multiple regions and scaled up capacity
- the team’s operation of their service appears to be excellent, with plenty of automation, observability, resilience and good support processes
What the team needs to explore
Before their next review, the team needs to:
- consider including in your offering ‘log searching and analytics’ functionality. The platform already contains several similar support services that support service teams to follow best practices in managing their digital service, and a log service may benefit users in a similar way
7. Understand security and privacy issues
Decision
The service met point 7 of the Standard.
What the team has done well
The panel was impressed that:
- the team is focused on delivering security. A key plank of this is the choice of the PaaS model, combined with the team’s use of the Cloud Foundry vulnerability feed, which means that OS-level patches are applied very quickly and efficiently to most hosted services, without service teams needing to be involved
- cybersecurity assurance is provided in combination with up-to-date ITHC and Cabinet Office IA oversight
- the team is familiar with the privacy risks, have appropriate safeguards on tenants’ data, and have covered off the DPIA, privacy notice and GDPR Article 30 records
8. Make all new source code open
Decision
The service met point 8 of the Standard.
What the team has done well
The panel was impressed that:
-
all their repositories are open, including code, infrastructure, tooling, team processes and user documentation - only secrets are private. Their tooling has received outside contributions, indicating reuse. In addition they are an active developer of Cloud Foundry’s open source software, to benefit both the team and the wider community. Key repos:
- https://github.com/alphagov/paas-cf
- https://github.com/alphagov/paas-bootstrap
- https://github.com/alphagov/paas-admin
- https://github.com/alphagov/paas-team-manual
- https://github.com/alphagov/paas-tech-docs
9. Use open standards and common platforms
Decision
The service met point 9 of the Standard.
What the team has done well
The panel was impressed that:
- there is good use of open standards, including the container interface, databases, monitoring components and RESTful platform API
What the team needs to explore
Before their next review, the team needs to:
- move from Bosh to Kubernetes or justify why not. As the team are aware, Kubernetes has rapidly become the defacto open standard in this space
10. Test the end-to-end service
Decision
The service met point 10 of the Standard.
What the team has done well
The panel was impressed that:
- they have a range of automated testing, which is done in a clone of the production environment. There is also extensive monitoring and alerting in production
- accessibility is well covered, with team testing the service in the empathy lab, an external audit complete and an accessibility statement in place
11. Make a plan for being offline
Decision
The service met point 11 of the Standard.
What the team has done well
The panel was impressed that:
- the team described some superb practices that form key parts of their incident handling process, with well-thought out communications, technical actions which are scripted and rehearsed, and a mature retrospective process. It was impressive to hear that when a service team unfortunately suffered downtime due to a platform incident, the users became more committed to using the platform, because not only was it quickly resolved, but that they appreciated the quality of the root-cause analysis and subsequent improvements
- the team have a well-developed set of mitigations covering scenarios from component failures, supplier failures and attacks. Real-world disaster recovery was demonstrated and documented, including a full rebuild of the platform
12. Make sure users succeed first time
Decision
The service met point 12 of the Standard.
What the team has done well
The panel was impressed that:
- user research, testing and analytics have been used to make substantial iterations to the service, eg the development of the secrets management feature
- following a post-audit debrief with the DAC, the team are looking to procure an alternative to third-party solutions where accessibility issues have been raised
- a framework is now being created to enable service engineers to understand how best to create accessible interfaces for future iterations, demonstrating the attention this service team is paying to accessibility
What the team needs to explore
Before their next review, the team needs to:
- ensure that all recommendations from the latest DAC report are explored, including the accessibility of the PaaS admin interface, clearly outlining any non-compliances in the service’s accessibility statement, thereby enabling users with access needs to understand how well they will be able to navigate the service pages
- validate how well the service proposition and service name is understood across government, particularly in relation to other services like Crown hosting, exploring in full the ramifications of rebranding the service at this point, and the corresponding risk-benefit ratio
- ensure that the assisted digital support model has been tested to show that most users can complete the end-to-end user journey first time around
13. Make the user experience consistent with GOV.UK
Decision
The service met point 13 of the Standard.
What the team has done well
The panel was impressed that:
- the team acted on feedback from the peer review report to include content design and interaction design expertise in the development of the admin interface
What the team needs to explore
Before their next review, the team needs to:
- have processes in place that ensure you have timely access to content design and interaction design expertise when developing future iterations of technical documentation, command line messages and the rest of the service pages
- while the panel appreciates that the specialist terminology used throughout the service is based on the language used by the underlying Cloud Foundry technology, the team needs to ensure that the service is not over-reliant on this, and that content design is based on the principles of plain English. When writing for even a specialist audience, try to use plain English wherever jargon can be avoided
- undertake a full content review of the service pages and admin interface pages to ensure that content design and language adheres to the GOV.UK Design System and style guide. For example, on the Cloud service product page, not all headings are written in sentence case, there is an incomplete sentence present, and contradictory information: ‘The team provides 24/7 support for any platform-related issue. If your team is experiencing any issue using the platform we can provide assistance during office hours’
14. Encourage everyone to use the digital service
Decision
The service met point 14 of the Standard.
What the team has done well
The panel was impressed that:
- the team has been very proactive where possible in terms of uptake and product growth. In the last year, they’ve shown stable organic growth in terms of trial and paid accounts
- there are active considerations on how to reduce running costs and optimise fixed costs in a bid to get closer to the recharge target. For example, they are looking at tech improvements and closing trial accounts after a certain period of time
What the team needs to explore
It was mentioned that the team will never be able to fully recoup their cost through a recharge model so will need to rely on centrally provided funding. This poses an important risk to the longevity of the product, as it is dependent on political buy-in and budget from the Treasury.
To that end, they should:
- consider and ensure that all the services currently using GOV.UK PaaS are aware of this risk and have some form of exit / contingency strategy in place for their cloud hosting services as a form of mitigation
- focus more on identifying and removing blockers to conversion from trial to paid accounts, and also whether there is any way to scale up engagement and the onboarding process to make it more self-service
Before their next review, the team needs to:
- expand their reach across services, in particular for teams outside of central government. There seems to be significant benefits and potential for growth in the local authorities and emergency services, which is currently largely untapped. It was encouraging to hear that they collaborate with MHCLG, are active as part of the Local Digital Declaration, and occasionally participate in public events and Show & Tells. These efforts should be increased further in Live. The team should also consider what they can learn from around engagement and collaborate with the other GaaP products that have significant uptake in those sectors. It might be necessary to employ dedicated engagement experts who can help scale that work
15. Collect performance data
Decision
The service met point 15 of the Standard.
What the team has done well
The panel was impressed that:
- the team have worked hard to collect extensive performance data
- they are using a range of tools and mechanisms to creatively gather metrics
- use open source tools such as GitHub and pages on GOV.UK to present data
- they have thought about how data can improve the service
What the team needs to explore
Before their next review, the team needs to:
- they have a wide range of tools to monitor the system. It might be beneficial to consider whether they could be consolidated or whether APIs could be built enabling data to be combined into fewer interfaces
- this service could be subject to tail risks, rare events which have a big impact. It could be beneficial to mitigate this by regularly validating with users what is being run, the criticality, predicted usage increase and when they might require 24 hour support
- develop a metrics approach to when services might need moving off the PaaS or are a risk to the PaaS. What will inform this decision?
- periodically assess the frequency that metrics are gathered to ensure they are appropriate for a growing service
16. Identify performance indicators
Decision
The service met point 16 of the Standard.
What the team has done well
The panel was impressed that:
- worked hard to identify performance indicators
- employed a range of skills and individuals to help generate indicators
What the team needs to explore
Before their next review, the team needs to:
- make clearer the model for how they project and quantify benefits
- ensure they always have appropriate individuals in the team to develop and manage performance metrics
- it might be beneficial to consider whether their high-level managerial metrics are comprehensive enough particularly if something goes wrong. What do the services they manage do and what is the impact of these services if they are not available
- given the excellent work they have done in creating an important service the GDS managerial team could become users of their data. Gathering their input in the generation of KPIs could be beneficial
17. Report performance data on the Performance Platform
Decision
The service met point 17 of the Standard.
What the team has done well
The panel was impressed that:
- they have have made information publicly available
What the team needs to explore
Before their next review, the team needs to:
- periodically assess what other information could be made publicly available
18. Test with the minister
Decision
The service met point 18 of the Standard.
What the team has done well
The panel was impressed that:
- there seemed to be good buy-in from senior stakeholders in GDS (some of whom were even present for part of the assessment), and a clear understanding from the senior leadership team that this product is a key priority to be supported in the future
- the team also indicated that the Infrastructure and Project Authority (IPA) is fully aware and supportive of the project