Deliverable 2: principles to support the development and deployment of artificial intelligence or machine learning-enabled medical devices across jurisdictions
Published 30 December 2021
Introduction
As part of the objectives of the G7’s health track artificial intelligence (AI) governance workstream 2021, member states committed to the creation of 2 complementary papers:
- The first paper seeks to define the phases and good practice for clinically evaluating artificial intelligence or machine learning (AI/ML) enabled medical devices.
- The second seeks to define and agree good practice for assessing the suitability of AI/ML-enabled medical devices developed in one G7 country for deployment in another G7 country.
These papers should be read in combination to gain a more complete picture of the G7’s stance on the governance of AI in health.
This paper is the result of a concerted effort by G7 nations to contribute to the harmonisation of principles that support the development and deployment of AI/ML-enabled medical devices in health and care. It builds on existing international work led by the:
- Global Digital Health Partnership (GDHP)
- Institute of Electrical and Electronics Engineers (IEEE)
- International Medical Device Regulators Forum (IMDRF)
- International Standards Organisation (ISO)
- International Telecommunication Union (ITU)
- World Health Organization (WHO)
- Organisation for Economic Co-operation and Development (OECD)
Current context
AI/ML-enabled medical devices are becoming increasingly prevalent in clinical settings around the world. Accompanying this growth are national and international efforts to increase the scrutiny on the performance and safety of AI/ML-enabled medical devices.
Challenges remain with governing the use of AI/ML-enabled medical devices in healthcare, particularly in assessing and comparing the performance and safety of models. A lack of internationally recognised principles for reporting AI models means assessing AI/ML-enabled medical devices against a standard of care – or one another – is currently a challenge and imprecise.[footnote 1]
Current efforts to assess the development and deployment of AI/ML-enabled medical devices across jurisdictions tend to only consider model performance. Given the safety concerns with AI, there is a need to explore how we can take into account further considerations, such as model development, robustness and other safety considerations, in supporting the development and deployment of AI/ML-enabled medical devices.
As G7 nations, we have a key role to play in setting principles for what good looks like and supporting the development and deployment of AI/ML-enabled medical devices across jurisdictions. This would enable greater transparency on the quality of AI/ML-enabled medical devices, and promote patient safety and responsible innovation.
We aim to progress our understanding and discussions on the following key questions. How do we, as G7 nations:
- think about AI robustness and approaches to testing AI/ML-enabled medical devices, particularly for models developed in one country and deployed in another?
- compare and evaluate the performance, safety and deployment of AI/ML-enabled medical devices developed by different organisations or deployed in different clinical settings?
Vision for the future
As G7 countries, we are well positioned to lead the conversations on the development and adoption of safe, effective and performant AI/ML-enabled medical devices in health settings around the world.
As countries develop governance processes for AI/ML-enabled medical devices deployed in healthcare, it is an opportune moment for G7 nations to lead the way on this conversation, and encourage a consensus on the principles for supporting the development and deployment of AI/ML-enabled medical devices to promote patient safety and foster innovation.
Internationally recognised principles for the development and deployment of AI/ML-enabled medical devices across jurisdictions could allow for the comparison of AI/ML-enabled medical devices deployed in different countries.
With many AI/ML-enabled medical devices in healthcare trained on data from one country but deployed in another, internationally recognised mechanisms to test the robustness and generalisability of AI/ML-enabled medical devices could support countries as they navigate a growing new marketplace. More information about models and better standard forms of reporting data sets are necessary for models to safely traverse borders. These mechanisms also have the potential to be used in fast-tracking future regulatory approvals, as appropriate, from country to country.
As G7 nations, we recognise the need to work together to define this set of principles for supporting the development and deployment of AI/ML-enabled medical devices across jurisdictions. This will accelerate international adoption of AI/ML-enabled medical devices in healthcare.
We aim to build a set of values in our countries for supporting the development and deployment of AI/ML-enabled medical devices across jurisdictions that:
- champion transparency – we are committed to championing transparency as a mechanism to reduce asymmetries of information between countries. This will enable G7 nations to obtain more detailed information on the AI/ML-enabled medical devices they might be importing and conversely provide more information about the products they might export
- foster fairness and equity – we are committed to fostering fairness and equity – building on the values we share as G7 nations. We recognise the need to ensure that AI/ML-enabled medical devices used in healthcare settings will not exacerbate health inequalities or any other negative externalities
Supporting the development and deployment of AI/ML-enabled medical devices across jurisdictions
We are committed to an international harmonisation of principles that can be used at different stages of the development cycle of AI/ML-enabled medical devices. The principles laid out below should be taken in conjunction with the discussion outlined in Deliverable 1: principles for the evaluation of AI/ML enabled medical devices to assure safety, effectiveness and ethicality.
1. Understanding the data
Supporting the development and deployment of AI/ML-enabled medical devices is dependent on good data governance and management practices. It is also contingent on manufacturers understanding and providing information about how data used by AI/ML-enabled medical devices is collected, organised, labelled and processed (even if this is carried out by a contracted third party).
By understanding and improving the quality of the data that is being fed into these models we can work towards mitigating bias and discrimination and foster fairness and equity.
a. Provide information over data collection
- i. data used in AI/ML-enabled medical devices should be collected legally and observe national and international laws. This could include providing information about data provenance, how data was collected, permissions for use and whether the data contains sensitive or personal information[footnote 2]
- ii. reflect the data the model will actually ingest in the real-world (meaning it might not necessarily be ‘gold standard’)
b. Provide information about data quality
- i. training, validation and testing data sets shall take into account, as necessary, the geographical, behavioural or functional setting (as there are key differences between testing AI in a controlled setting versus its performance in a real world setting) within which the A AI/ML-enabled medical device is intended to be used
- ii. information about sample size should be provided. The sample should be large enough to realistically train a robust model with sufficiently balanced classes. It is also particularly important that the validation set sample size is large enough to test the robustness of the model
- iii. the data should be representative of the populations where models are intended to be deployed (that is age, clinical status, race, ethnicity, gender and so on in accordance with data protection regulation) and tailored to the model’s use-case when deployed in a clinical setting
c. Provide information over data organisation
We recognise the need to ensure:
- i. training and validation datasets are held separately and carefully, avoiding data leakage from the data used to train, validate and test a particular AI/ML-enabled medical device by effects such as common pre-processing, duplicates or implicit temporal dependencies of data
- ii. data quality assurance, data management, and robust cybersecurity practices ensure data authenticity and integrity
d. Provide information over data labelling
Data labels are used to train AI/ML-enabled medical devices by providing a ground truth for the model to learn from. Within healthcare, labels can also be used to compare AI/ML-enabled medical device performance with a reference standard (for example, the performance of clinicians).
Inconsistent or poor-quality labels can lower model performance or introduce biases into AI/ML-enabled medical devices.
Accepted best available methods for developing a reference data set and labelling data ensure clinically relevant and well characterised data is collected, and that the limitations of the reference data are understood.
e. Provide information over data processing
Data processing includes mitigating inconsistencies or limitations in the data (‘data cleaning’) and manipulating the raw data (‘feature engineering’). How data is processed:
- will impact the performance of AI/ML-enabled medical devices
- has the potential to introduce biases in the model outputs
- affects the viability of models when deployed in different clinical environments
2. Understanding the model
AI is a rapidly changing field. Standardising the reporting of how AI/ML-enabled medical devices are developed and the metrics used to calculate model performance can:
- help track developments in the sector
- make supporting the development and deployment of AI/ML-enabled medical devices across jurisdictions more seamless
- promote greater transparency
a. Reporting guidelines
Reporting guidelines could be useful tools to assess whether developers adhered to good scientific principles and these could be incorporated into supporting the development and deployment of AI/ML-enabled medical devices across jurisdictions.
Internationally recognised guidelines are currently being updated to include AI-driven technologies such as:
- CONSORT-AI
- SPIRIT-AI
- TRIPOD-AI
- STARD-AI
- DECIDE-AI
- ALTAI
- QUADAS-AI
b. Metrics
A range of different metrics (such as specificity and sensitivity) can be used to calculate AI/ML-enabled medical devices. Depending on the model’s use case, different metrics will be applicable.
Transparency and justification on what metrics are used – preferably in advance – will help assess if a metric may be a misleading representation of the model’s performance.
Where feasible, standardised metrics could also allow for more seamless support of the development and deployment of AI/ML-enabled medical devices across jurisdictions.
c. Information to users
Users are provided ready access to clear, contextually relevant information that is appropriate for the intended audience (such as clinicians or patients).
3. Robustness
Once deployed in the real world, AI/ML-enabled medical devices use input data, which is often noisy, has defects or changes over time – all of which could lower model performance or introduce biases. Data quality can be especially variable within healthcare due to unique factors within each clinical setting – magnifying any robustness concerns.
Due to the valid – and well documented – safety concerns with deploying AI models in healthcare, robustness should be an explicit consideration of supporting the development and deployment of AI/ML-enabled medical devices across jurisdictions.
To champion transparency and foster fairness and equity, the following areas are a particular priority:
a. Corruption robustness
Noise, defects or changes to training datasets may lower model performance or introduce biases into the modelling.[footnote 3] Healthcare data can be subtly altered by factors specific to individual clinical settings – making this a particular concern for AI/ML-enabled medical devices.
Performance will never be static against all different data inputs – thresholds (informed by standards of care) will be required to determine acceptable ranges of model performance.
b. Testing robustness
Data sets to test robustness could include:
- artificially corrupting real-world data sets
- developing synthetic data sets with subtle changes to data imperceptible to humans
- acquiring and testing on data sets fully representative of real-world conditions
c. Post-market surveillance
Changes in deployment environments will result in new permutations that may result in unintended bias or for how data could be corrupted. Monitoring for model degradation, unintended bias and mitigating corruption robustness requires a continuous process for all deployed AI/ML-enabled medical devices.
Frequency of post-deployment model testing will depend on, among other factors:
- the risk profile of the AI/ML-enabled medical devices
- availability of testing data availability
- model maturity (see paragraph 4.1 of Deliverable 1: principles for the evaluation of AI/ML-enabled medical devices to assure safety, effectiveness and ethicality).
d. Adversarial robustness
A malign actor could potentially deliberately design perturbations within a data set or model input to intentionally mislead AI/ML-enabled medical devices or change modelling outputs – this would be a particular risk in the field of imaging.
However, it is more likely that adversarial examples arise in health and care due to human error caused by frontline pressures or staff fatigue.
Reliability of deployment infrastructure environments and cyber security protections are key considerations for defending AI/ML-enabled medical devices against malicious activity.
Conclusion
As G7 nations, we are committed to working together – taking into consideration different jurisdictional frameworks and in close co-ordination with other international initiatives – to promote the harmonisation of principles and eventually standards for how we support the development and deployment of AI/ML-enabled medical devices across jurisdictions.
We will:
- drive international alignment
- work to promote patient safety
- support responsible trustworthy innovation of AI/ML-enabled medical devices for public benefit
-
There is some work underway on the standardisation of model reporting, such as Sendak MP, Gao M, Brajer N and others. ‘Presenting machine learning model information to clinical end users with model facts labels.’ npj Digital Medicine 2020: volume 3, issue 41. ↩
-
If the manufacturer utilises AI regulatory sandboxes, this should be made transparent in order to enable the user to evaluate if data has been processed legally. ↩
-
This will need to be balanced against point 1a.ii. ↩