Approval standards and guidelines: data flow diagram
Updated 2 August 2024
Approval standard: data flow diagram
When must this standard be met
This standard must be met for all applications to access UKHSA data classified as ‘Protected’.
Standard
- The application must include a data flow diagram to visualise how the data will be processed. It must document:
- all the entities (people, organisations and systems) involved in processing the data, in part or in whole
- the system boundaries (including whether data will be processed strictly within the UK and European Economic Area (EEA) as per Approval standards and guidelines: processing location)
- the data flows between each entity
- the processes the data will go through
- all data stores, such as files, repositories or safe havens that hold information for later use and allow for data to be retrieved
2. The data flow diagram must use standardised notations and include a key. Examples are provided in the accompanying guidelines to this approval standard.
3. The data flow diagram must contain sufficient detail to aid UKHSA to, at a glance, understand:
- the relationships between different entities involved in the proposed data processing
- how the data will be managed throughout the data lifecycle, up to and including the effective destruction (the National Cyber Security Centre has guidelines on secure sanitisation of storage media and information on effective data destruction)
- how key processes, including data linkage or the application of opt outs (see Approval standards and guidelines: national data opt-out) will be managed and the data flows necessary to enable such processes to occur
- if data is obtained from other sources than UKHSA
- if there are any transfers of the data, including international transfers
- where the data will be stored at rest, such as use of cloud processing (if public or private cloud processing is to be used, see Approval standards and guidelines: engaging a data processor as well as Approval standards and guidelines: processing location)
4. The data flow diagram must be consistent with all other information submitted in the application and to other approval bodies.
Guidelines
A data flow diagram is a visualisation tool used to illustrate how information flows through a process or system. It includes data inputs and outputs, data stores, and the various subprocesses the data moves through. The design and complexity of a data flow diagram can vary depending on the process it represents. It can be as simple as an outline of a general system or as detailed as a multi-level procedure.
As part of your application, you are asked to provide a data flow diagram. The diagram must be designed based on the complexity of the project.
When creating a data flow diagram, it is best to start with higher level (context) diagrams and decompose processes to lower levels of detail as needed. This will assist you in ensuring that it is clear which organisations or systems will be involved, as well as displaying a clear understanding of all steps that will be taken when processing the data.
Example of a data flow diagram
Example 1: Illustrative data flow diagram submitted to UKHSA
The primary applicant in this application wishes to send a survey to people who have been diagnosed with COVID-19 and intends to commission a company to send the survey on their behalf.
The applicant would like UKHSA to identify people that could be invited to participate in the survey. They are interested in sending the survey only to people who have had a positive lateral flow test within a 3-month period in 2021.
Using the Second Generation Syndromic Surveillance System, UKHSA will identify people to send the survey to and securely send this list to the survey provider. This survey provider will then post the survey to everyone that is eligible to be invited, after important checks are conducted on their addresses and to verify if they are alive.
In this example, the survey provider is a data processor acting under instruction of the primary applicant (the data controller). This means they can only process the data to the exact requirements of the primary applicant. They will be instructed to:
- verify the address details of each person
- verify whether the person is alive to ensure that the surveys are not sent to anyone that has died
- send copies of the survey and other study materials to each potential participant
- consolidate the responses of everyone that takes part
- send frequent reports to the primary applicant about responses rates
- provide a final dataset to the primary applicant that includes everyone who has completed the survey
Once the survey has completed and responses have been consolidated by the survey provider, they will send the survey responses back to the primary applicant. The primary applicant will store the data in a private cloud service that is commissioned by their organisation and use the data in their analysis for their research.
The data flow diagram in Example 1 uses the Gane and Sarson standard notation to represent the diagram’s 4 main components: external entities, process, data store, and data flow.
It describes the relationships between the controller and the processor and provides enough information to describe how and by whom the data will be processed. It states that the data stores may contain servers located in third-party countries (countries that do not have comparable data protection standards as in the UK or EEA).
Notation
Examples of notation used in data flow diagrams are detailed below and taken from the methods described by:
Yourdon (1989) and DeMarco (1978):
- circles represent processes
- squares or rectangles denote external entities
- horizontal parallel lines symbolise data storage
- arrows indicate data flow
- rounded rectangles represent processes
- squares or rectangles denote external entities
- open-ended rectangles symbolise data storage
- arrows indicate data flow
It is recommended that one of these standardised approaches is applied in your data flow diagram.
References
1. Yourdon E. ‘Modern Structured Analysis’ Yourdon Press computing series 1989
2. DeMarco T. ‘Structured Analysis and System Specification’ Yourdon Press computing series 1978
3. Gane C and Sarson T. ‘Structured Systems Analysis’ Prentice-Hall software series 1979