Pathogen Genomic Data Sharing
Updated 13 February 2025
The UK Health Security Agency (UKHSA) has committed to pathogen genomic data sharing and global strategy in its 5-year Pathogen Genomics Strategy, including objectives to enable rapid public data sharing through development of policies, systems and infrastructure.
UKHSA has now embedded a risk assessment procedure to facilitate release of pathogen genomic data into the public domain and to trusted research environments. This will allow our data to be used to the fullest extent for public health benefit and to support global health surveillance and security.
Risk Assessment
The risk assessment is performed by pathogen subject matter experts using a standardised approach and is ratified by a multidisciplinary group including the privacy team from the UKHSA genomics programme. This takes into account the genetic diversity of the pathogen, potential risks to privacy and biosecurity, and commercial or legal sensitivity.
At present the risk assessment applies to pathogen whole genome (or fragment) sequences. Metagenomic data and raw sequence data are excluded whilst additional work is undertaken on validating processes for human sequence read removal.
The publication of pathogen genomic data generally poses minimal data protection risk, primarily because such data does not contain personal or special category personal data. However, in some cases, residual risk can be associated with the public sharing of this data.
Pathogen genomic data is rated Red, Amber or Green (RAG):
-
Red pathogen genomic data is assessed as indicating that an internationally significant event has occurred in the UK and requires a managed release involving appropriate partners with individual assessment and approval by UKHSA executive staff
-
Amber pathogen genomic data is assessed as requiring context-specific risk assessment before release - it may be shared with appropriate governance to specific partners or trusted research environments, or metadata may be reduced to reduce risks
-
Green pathogen genomic data is assessed as posing negligible risk to privacy or biosecurity - this data should be routinely released to the public domain within a planned time frame
Pathogen Risk Assessment Form
For identification of the pathogen capable of causing adverse human or animal health effects
Prompt | Entry field (to be completed by assessment team) |
---|---|
Brief summary of the pathogen, clinical and public health context, epidemiology, reason for sequencing, UKHSA use of the sequence data. | [assessor comments] |
Genetic diversity of the pathogen | [assessor comments] |
Confirm that pathogen data being released is from naturally circulating pathogens and these pathogens have not undergone modification in the laboratory | [assessor comments] |
Which RAG category does this pathogen belong to and why? | [assessor comments] |
Risks to privacy of releasing this data into the public domain, including but not limited to where it is combined with other data (for example: patient clinical data, reference data) | Notes: Risks: Mitigations: |
Risk of residual human DNA being deposited in public archives resulting from cross-species contamination in raw read data | Notes: Risks: Mitigations: |
Risk to biosecurity including but not limited to inadvertent disclosure of an event in the UK, such as an outbreak, before it has been recognised or appropriately acted on (including before relevant partners have been made aware) | Notes: Risks: Mitigations: |
Risks around disclosures that could be commercially sensitive in nature or potential medico-legal outbreaks - which could result in a prosecution | Notes: Risks: Mitigations: |
Any other risks identified | Notes: Risks: Mitigations: |
Overall RAG Rating | Red/Amber/Green |
Assessed by | Name(s): Signature: |
Counter-signed by | Name: Signature: |
Metadata
The RAG rating assumes that a standard metadata set is released, as below A to F. Where a risk assessment identifies that data is amber due to metadata inclusion, metadata elements can be removed to reduce the risk as assessed by the pathogen lead assessor and ratified by the data sharing working group.
A. Release specific ID (an ID specific to the release and not related or linkable to any unique identifier except by UKHSA)
B. Sample date (MM/YY)
C. International Territorial Level (NUTS1 – 9 regions in England)
D. Sample type (for example, from standard set of respiratory, blood, faeces)
E. Sequence method
F. The name of the pathogen
Data sharing locations
All pathogen genomes will be shared to a fully open-access location, currently the European Nucleotide Archive (and NCBI GenBank and DDBJ via the International Nucleotide Sequence Database Collaboration). In addition, data may be shared to specialist databases and repositories. In this case, the anonymised identifier will be the same so that data can be linked across repositories.
The source of all data will be acknowledged in published outputs.
Current risk assessments
The data sharing list is under development and will continue to be updated regularly with new assessments.
Red:
All genomes from pathogens classified as high consequence infectious diseases (HCIDs)
Amber:
- M. tuberculosis
- HIV
- hepatitis B virus
- hepatitis C virus
Green:
- seasonal influenzas
- SARS-CoV-2
- Streptococcus pneumoniae
- Staphylococcus aureus
- Group A Streptococcus
- Neisseria meningitidis
- opportunistic healthcare-associated bacterial pathogens including (but not exclusively):
- Enterobacter hormaechei
- Klebsiella pneumoniae
- Enterococcus faecium
- gastrointestinal bacterial pathogens including:
- Escherichia coli
- Salmonella
- Shigella
- Vibrio
- Yersinia
- Campylobacter
- Listeria