Aival Evaluate, Aival Analysis Lab: Validating third-party AI by assessing performance, fairness, robustness and explainability on internal data

Aival Evaluate reports the performance, fairness, robustness and explainability of AI products under consideration on a user’s data, enabling comparison along the same baseline.

Background & Description

Aival Evaluate allows customers to objectively and independently assess commercial AI products before purchase, without needing technical expertise. Customers can compare between products on the market and gain trust that a product will perform effectively and fairly for their needs. Aival Evaluate reports the performance, fairness, robustness and explainability of AI products under consideration on a user’s data, enabling comparison along the same baseline. Aival Evaluate is a module of the Aival Analysis Lab, with an initial focus on imaging applications such as Radiology AI.

How this technique applies to the AI White Paper Regulatory Principles

Safety, Security & Robustness

Aival Evaluate provides independent analysis of how an AI product works on a customer’s data. We can automatically augment a given evaluation dataset to cover wider-ranging scenarios such as generating varying levels of noise, contrast and image artefacts. Our software outputs a standardised performance report for each product being assessed to enable objective comparison.

Appropriate Transparency & Explainability

Aival Evaluate can identify which features were used in an AI classification prediction. It does this without access to the developer’s model architecture or training data. For image applications, this will be displayed as a heatmap highlighting the areas of the image which were needed for the classification result. This helps a user to understand if an AI product is making the right decisions for the right reasons, building trust in black-box AI outputs.

Fairness

We break down our statistical analysis into subgroups, to ensure the AI product works well and fairly for different groups. For medical images, demographic and scanner information is extracted from imaging metadata and stratified statistics are reported. We also allow the user to upload their own subgroup metadata to display richer analysis on the groups relevant to their use case.

Accountability & Governance

By using our software, customers can show that they have undergone an objective and robust procurement process. By having independent results on their own data, informed procurement decisions and value can be substantiated and recorded.

Why we took this approach

Independent testing of AI is important as AI products do not necessarily generalise to new situations, this is particularly crucial for high-stakes applications. Often AI developers do not make their training data or model architecture available so testing must be done without this access. We wanted to give control of adoption to the buyers of AI products by providing the tools to easily run this analysis independently. Often customers with knowledge of the application domain do not have specialist AI training - Aival Evaluate is designed for ease-of-use without needing technical expertise.

Benefits to the organisation using the technique

Aival Evaluate supports organisation to make an informed and transparent decision when choosing between AI products on-the-market.

Our analysis reports allow the user to understand how the products will work in wide-ranging scenarios which are representative of their own data and use cases. This can facilitate procurement and faster adoption of AI that best suits the needs of the adopting organisation. It is particularly useful for organisations that have the domain expertise to interpret results but lacks the technical resources to run a thorough analysis on their own.

Limitations of the approach

Analysis is most useful if performed on data that is representative of the adopting organisation. Aival can help with data curation and expanding the customer dataset to broaden the variation tested, as well as providing sample datasets of our own. However, it is preferable that the customer can provide their own evaluation data. Similarly, the AI vendors should be willing to provide results on these datasets in advance of being purchased. We have found that vendors are amenable if this request comes from the customer.

Further AI Assurance Information

Updates to this page

Published 9 April 2024