Publish reference data for use across government
Share your reference data for use in projects and services outside your organisation.
Follow this guidance if you’re a government employee and you need to publish reference data so others can use it across government. It shares the best practice for creating a strategy to manage and support reference data for publishing.
This guidance does not explore systems or infrastructure on which to publish your organisation’s reference data, or tools that can help you.
Define a reference data publishing strategy
Reference data categorises other data within your organisation, helping to give it meaning, make it reliable and make it trustworthy.
Reference data usually consists of codes, descriptions and definitions of data. For example, the ISO-3166 country codes are an internationally recognised set of codes that you can use to refer to countries and their subdivisions.
Reference data is a valuable asset which can inform and help users make decisions. Publishing reference data for others across government is an excellent way to share its advantages while saving time and cost for others.
You will need to properly prepare your reference data for publishing. Reference data should be published in a way to make it:
- always accessible and up-to-date
- validated and accurate at any point in time, with a history of changes maintained, and expired versions still available to use
- findable and accessible to its users
- supported by an organisational infrastructure that can handle the processes and demands of creating and maintaining the reference data itself
Appoint a reference data owner and steward to manage your publishing strategy
You should effectively manage your publishing strategy. A good way to do this is by appointing both a reference data owner and a reference data steward to work together to prepare your reference data properly, and support it and its users after publishing.
The reference data owner ‘owns’ the reference data itself, and their main responsibilities are to:
- define reference data
- make policy and decisions about reference data based on your organisation’s individual responsibility
- decide who can access and change the reference data
The reference data steward carries out the rules set by the reference data owner. The steward is responsible for:
- the quality of the data
- compliance with regulatory requirements
- conformity to your organisation’s data principles and policies
- any practical data-related issues such as incomplete records or queries raised by users
The reference data owner and steward do not have to be individuals. The responsibility is best handled by teams. This can help spread tasks evenly and improve accountability.
You could also create a reference data ‘forum’ of individuals from relevant parts in your organisation to discuss reference data governance more generally, as well as in relation to publishing.
Establish and use a single, trusted source for your reference data
You should create your reference data from a single, trusted data source. If this single source belongs to your organisation, it should be placed in a storage system or database, and be known as your system of record (SOR).
In some situations, you may need to create a reference data set by combining several data sources. In these cases, each published reference data set should have its own SOR, created by combining any existing SORs which contributed to it.
If the single authoritative data source, or sources, do not belong to your organisation, you should work with the owner of the source to make sure your reference data reflects their SOR.
You should mark each record in your reference data set with a unique identifier (UID) to associate it with the same SOR for the life of the data set. This makes it easier for your users to index, search and manage the reference data, as well as track changes between published versions.
A UID marks a record as entirely different from every other record in a data set. The syntax of a UID should be made up of letters, numbers or a combination of these. Examples include serial numbers, stock keeping units (SKU) as found on barcodes on items for sale, or currency codes, as found in international currency conversion services.
You should also make UIDs persistent, which means guaranteeing they are managed and kept unchanged for the life of the reference data, in order to ensure accuracy and consistency for your users.
When using data sources from other organisations, you should follow their own rules for using their data. For example, use of Ordnance Survey’s unique property reference number data (UPRN) set needs to follow the Ordnance Survey Open Identifiers policy.
When creating new SORs and reference data sets, you should follow government guidance around reusing data whenever possible. This will help cut down on waste and duplication.
Publish your reference data for usability and security
You should make your published reference data set readable by humans and machines.
The Government Digital Service recommends an API-first approach, publishing reference data in JSON format. You may also want to consider publishing in CSVW (CSV on the Web) format, should users need a CSV file. When publishing in document form, you should use the Open Document Format (ODF) standard.
The most important thing is to make sure the format you choose to publish in is most suitable to your users’ needs.
When publishing to the web, you should follow GOV.UK guidance around best practice in search engine optimisation (SEO) to make your reference data as findable as possible.
You should include metadata with your reference data set that provides:
- an overview of the reference data set’s contents
- contact details of its steward
- when it was created
- when it was last updated
- a brief description of any new changes in the latest version
You can find out the best ways to create metadata by reading government guidance on metadata you should record to help others.
Sometimes, you’ll find that updating a reference data set requires publishing it as an entirely new, separate reference data set. Changing reference data that is already being used may cause systems or platforms that are using it to malfunction. For example, the UK Standard Industrial Classification of economic activities must still provide reference data for both 2003 and 2007 to suit different use cases.
When publishing a new version of a reference data set alongside an existing one, you should make sure:
- the new version is published as a new, standalone reference data, and not a change or variant of the existing one
- both the existing and the new version are available to users
- the correlation between the existing and new versions is made clear to users, preferably in an accompanying correlation document
The ONS published the UK Standard Industrial Classification of economic activities reference data, which is a good example of versioning.
Provide user support for your reference data
You should provide a simple way for users to give feedback or report errors when using your reference data, such as with an email link or web form.
It’s important that you know who is consuming your reference data. It allows you to provide better support to users and their community, including during upgrades, maintenance and unexpected downtime. A good way to know who is using your reference data is to encourage users to subscribe to it, for example by giving users the option to provide an email address when downloading it.
You should follow the Open Data Charter, and never require user registration to use your reference data. Let users decide whether they want to register or not.
Your published reference data set needs to be secure. This means it should be hosted in a secure environment and access to that environment managed securely, using HTTPS.
You can learn more about securing your information in the Service Manual.