Resaro’s Performance and Robustness Evaluation: Facial Recognition System on the Edge

Resaro evaluated the performance of third-party facial recognition (FR) systems that run on the edge, in the context of assessing vendors’ claims about performance and robustness.

Background & Description

Resaro evaluated the performance of third-party facial recognition (FR) systems that run on the edge, in the context of assessing vendors’ claims about the performance and robustness of the system in highly dynamic operational conditions.

As part of this, Resaro developed an end-to-end test protocol that included a standardised testing process to ensure a fair test across vendors, the creation of context-specific evaluation datasets that simulate operational scenarios, and the analysis of threat vectors, such as presentation attacks. Our evaluation went beyond traditional accuracy metrics to identify causes of missed detections and false alarms. We also rigorously tested computational demands which are crucial for edge devices.

The evaluation highlighted several trade-offs a buyer must consider when selecting an AI system, such as detection speed vs accuracy. Performance was also influenced by factors such as quality of the image, size of the face in the image, among others.

How this technique applies to the AI White Paper Regulatory Principles

More information on the AI White Paper Regulatory Principles

Safety, Security & Robustness

Applications of AI should function in a secure, safe, and robust manner where risks are carefully managed. FR systems must consistently perform well in highly dynamic environments and resist various adversarial attacks. A lack of robustness can lead to poor performance under operational fluctuations, which, in turn, can cause operational friction, such as false alarm fatigue for the human operator. Resaro’s tailored test protocol includes plausible scenarios designed from the perspective of the end-user’s interaction with the AI system and threat vectors. The systems are tested against an evaluation test set that reflects these scenarios.

Appropriate Transparency & Explainability

Transparency and explainability seek to provide clarity on a system’s decision-making process and how it is used. In this use case, we pursued explainability by enabling the decision-maker to understand the trade-offs made by vendors on factors such as speed, accuracy, minimum face size, etc., allowing us to identify the strengths and weaknesses of these systems. Our evaluation report offers clear insights into the strengths and weaknesses of each FR system, enabling non-technical decision-makers to make informed choices on the most suitable system for their use case and operational needs, providing a transparent basis for system selection.

Fairness

FR systems are expected to operate without unintended bias in identifying individuals, regardless of skin colour, gender, facial features, or other characteristics. Our evaluation meticulously assessed the AI systems across diverse sub-populations within the dataset, categorised by sensitive attributes such as gender and skin colour. We carefully monitored any variations in performance to pinpoint and address inherent biases within the systems, ensuring alignment with the principles of fairness and unbiased operational deployment.

Accountability & Governance

Ensuring accountability and strong governance is essential for the responsible deployment of FR systems. Through our comprehensive and rigorous evaluation process, Resaro enhances the accountability of vendors to their customers by ensuring that these vendors uphold their claims of performance, robustness, and fairness. This approach demands higher accountability from vendors and assures the buyer of the reliability and integrity of the AI system when deployed at scale.

Why we took this approach

Existing benchmarks, such as NIST’s Facial Recognition Vendor Test, provide a baseline assessment of facial recognition algorithms under specific conditions, which may or may not reflect the real-world scenarios in which the FR systems are used. For example, standard benchmarks assume high-powered server environments, which differ significantly from the edge processing requirements for this use case. In addition, not all vendors offer their NIST-tested algorithms to customers for commercial use. We filled these gaps by designing a standardised testing protocol, custom test dataset, and metrics that reflect the business and operational requirements.

Benefits to the organisation using the technique

Our evaluation method rigorously tests the claims made by various FR vendors, providing an objective assessment that goes beyond the vendors’ marketing materials. Our testing approach uncovers the trade-offs and limitations inherent in different FR technologies. Additionally, our comprehensive evaluation of resource usage enables decision-making on a system that best matches the operational demands and constraints. This approach to evaluation empowers buyers of AI systems to be well-informed about the suitability of the available systems for their use cases and risk appetites. This ultimately minimises security risks and potential reputational damage to the buyer.

Limitations of the approach

While our evaluation protocol thoroughly assesses the performance and robustness of FR systems over a wide variety of operational conditions, it may not encompass all potential real-world scenarios, especially as the operational context-of-use is dynamic. Such scenarios may require the generation of additional, more representative evaluation datasets. Our evaluation protocol fundamentally offers a snapshot of the performance of the current AI system under an agreed set of operational conditions. Hence, continuous or periodic re-evaluation might be necessary when the core assumptions of the vendors’ systems are no longer relevant (e.g., major system upgrade).

Further AI Assurance Information

Updates to this page

Published 26 September 2024