Resaro’s Bias Audit: Evaluating Fairness of LLM-Generated Testimonials

A government agency based in Singapore engaged Resaro’s assurance services to ensure that a LLM-based testimonial generation tool is unbiased with respect to gender and race and in-line with the agency’s requirements.

Background & Description

A government agency based in Singapore engaged Resaro’s assurance services to ensure that a LLM-based testimonial generation tool is unbiased with respect to gender and race and in-line with the agency’s requirements.

The tool uses LLMs to help teachers become more efficient and effective in writing personalised testimonials for their students. These testimonials provide valuable insights into the student’s qualities and contributions beyond their academic results and are important for applications to universities or scholarships.

LLMs trained on large corpus of historical documents may perpetuate gender and racial biases, which they may have learnt from large amounts of data from the Internet. To ensure that the testimonial generation tool was safe before putting it out for wider use, Resaro developed a method to quantify differences in the quality of generated LLM output with respect to input attributes. The solution identifies bias based on differences in language style (how it is written) and lexical content (what is written) of the generated testimonials across gender and race for students with similar qualities and achievements.

The audit process also flagged up other issues with the approach – hallucination on a small proportion of generated testimonials and limitations of the model in understanding local context. These issues were subsequently addressed as part of product design and guardrails, giving the agency additional confidence on the safety and robustness of the product released.

How this technique applies to the AI White Paper Regulatory Principles

More information on the AI White Paper Regulatory Principles

Safety, Security & Robustness

Robustness testing of LLM systems is important due to the wide variability of outputs that may result from different input prompts. In machine learning, perturbation refers to making small changes to the input data or model parameters to observe how the output changes, often to assess the model’s sensitivity to minor variations. We adapt this approach to the context of LLM systems by varying the input prompts across different inputs of interests e.g. student’s academic achievement, co-curricular activities, and personal qualities. Testing for unforeseen edge cases in this way provides the assurance that the tool would be able to generalise well.

An experimental study design approach with a thorough statistical framework testing provides additional confidence that the observed result is statistically significant and also helps to quantify the prevalence of certain observed output.

Appropriate Transparency & Explainability

To foster trust among educators and students and encourage adoption of the tool, transparency and explainability were prioritised throughout the development and roll-out process. While an automated testimonial generation tool is designed to enhance efficiency, educators ultimately remain responsible for ensuring that the generated testimonials are accurate and adhere to best practices.

Conducting a third-party audit and publishing the results and the scenarios offered valuable insights for the responsible use of the tool. As part of the audit findings, Resaro suggested to provide additional guardrails to nudge users to provide the right level of details required. Educating users about the range of responses they might encounter and the tool’s limitations will also help to promote trust and allow the tool to be used effectively.

In addition, future versions of the tools can also be bench-marked against an underlying framework, providing confidence that all the necessary areas of concern has been thoroughly checked before a new version is released.

Fairness

Testimonials, whether human written or AI-generated, have the potential to influence perceptions of students’ abilities and character. In the context of automated testimonial generation, fairness is particularly critical to prevent the reinforcement of harmful stereotypes, while ensuring that the generated testimonials reflect the diversity and uniqueness of each student.

As part of a bias audit, Resaro rigorously tested the tool by generating a wide variety of testimonials using diverse input prompts. The goal was to determine whether differences in student attributes were influenced by gender or race, potentially leading to biased outcomes.

Language style was assessed by comparing sentiment and formality across the generated testimonials. Sentiment analysis was employed to determine whether the LLM exhibited any bias in the emotional tone of the testimonials based on gender or race. Formality analysis, on the other hand, aimed to identify any discrepancies in the level of professionalism conveyed.

To evaluate biases in the lexical content, we classified the adjectives across all generated testimonials into 7 categories of common gender stereotypes - assertiveness, independence, instrumental competence, leadership competence, concern for others, sociability, and emotional sensitivity.

To measure bias, we ran a regression analysis and checked if the gender or race attributes had a statistically and meaningful effect on the measure of interest (sentiment score, formality score, or the lexical category percentage), while controlling for other input attributes.

Why we took this approach

Existing social science research has shown that there are significant gender biases in letters of recommendation, reference letters, and other professional documents. For a government agency, it is important that such biases are mitigated before the tool is released. A third-party audit provides additional independent checks at a broader scale and scope than might have been tested by an internal team.

Since this evaluation was conducted at the pre-deployment phase, Resaro had to construct its own representative sample of input prompts that captures the variety and breadth of the expected tool usage. A broad statistical study also allowed the team to review the variation in responses and generate statistically significant results in the evaluation of bias.

Benefits to the organisation using the technique

Resaro’s LLM bias audit helps provide additional assurance to organisations launching LLM based products to ensure that they are in compliance with AI regulations, industry standards, and business use case. An independent validation helps build trustworthiness of the system and uncover blind spots before the system is released into production.

Limitations of the approach

A bias audit is limited by the ability of Resaro to access the underlying model or system, and may not be fully representative of the end-to-end deployment process that the model is being used.

The audit was also done with the inputs of the agency based on representative student profiles. Differences in how users interact with the system and the quality of input data may vary and could affect the resultant output.

Changes in the foundation model and drift in student profiles over time will also necessitate conducting such audits more regularly.

Further AI Assurance Information

Updates to this page

Published 26 September 2024