A significant problem in international development is an inequality in influence between different actors in the ecosystem. Often major donor agencies and implementers have a significant role in shaping the priorities and practice of development, with NGOs based in-country, grassroots organisations, and the people who programmes are designed to benefit having less power to shape the sector.

As we’ve explored previously, one of the major challenges with AI is its potential to perpetuate and lock in these biases. In many cases, there is a disparity between data-rich and data-poor organisations which mirrors the inequalities of influence in the sector at large. When developing LLMs you need a lot of data to build a working use-case, which means we run the risk of only creating tools which are available to, and based on the perspective of, big international development actors.

Bias

One of the ways the challenge manifested itself in the tool DevelopMetrics built for ITR was the fact that their tool was trained exclusively on USAID data, and consequently only presented USAID perspectives. A suggestion that a particular intervention was successful or unsuccessful didn’t necessarily indicate that an intervention was or wasn’t worth implementing, rather that a consensus has been reached by a group of professionals working in a similar context and with similar professional experiences.

DevelopMetrics approach was a combination of mitigation, education and recognition:

Mitigating where the data could be trained differently to prevent certain perspectives from being sidelined.
Educating users to understand the biases of the tool so that they took them into careful consideration when using the tool to make decisions.
Recognition of where the data simply wasn’t available to raise the profile of specific perspectives in the tool.

Some of the approaches to mitigating bias that we’ve explored previously, such as finding synthetic data, weren’t appropriate in this context. A synthesised dataset, which aimed to replicate the perspective of different actors through the data they did have available, would have created a misinformation challenge and potentially glossed over the challenge of bias, rather than address it.

With synthetic data it can become unclear whether the data being drawn on comes from a genuine or synthetic source. The other issue is that some biases ought to be clearly recognised, rather than hidden. Ultimately, it's important to recognise biases and their influence. Creating a tool responsibly involves being honest about those biases, and ensuring people who use the tool recognise where that bias lies.

DevelopMetrics suggested that, ultimately, the best solution to these challenges is to support primary data collection to address current data gaps. By working with organisations and people whose perspective is underrepresented, we can create datasets which reflect their priorities and experiences, and be able to build tools which are more useful and less biased.

Strengthening user experience

Ensuring that LLM users can recognise and address these biases is important. As Dell’Acqua et al. raise, the development of LLM capability has been rapid, but uneven; performance in certain tasks has progressed much faster than other, adjacent tasks, creating a “jagged technological frontier” (Dell'Acqua et al., 2023). Users need to be able to understand this frontier, otherwise we run the risk of people “falling asleep at the wheel” and assuming that an LLM will perform a task well because it performed well on a similar task (Dell’Acqua, 2022). This challenge is exacerbated with a fine-tuned model, given that the data it is trained on and the way it’s been finetuned will create a unique frontier.

DevelopMetrics focused on educating users to understand the technical limitations of the tool. Encouraging users to be vigilant in how they used to the tool, they focused on helping users spot potential biases in the assessments of the success of an intervention. By training users to recognise that the tool’s assessment of the success of an intervention didn’t necessarily reflect its potential impact, DevelopMetrics encouraged users to take a critical approach to using the tool drawing on their own expertise to analyse responses. If you combine this training with effective feedback loops, such that users can report bias, then you create a more transparent system which makes the “jagged frontier” visible and easier to navigate.

Non-technical considerations

Bias

DevelopMetrics approach was a combination of mitigation, education and recognition:

Mitigation

Education

Recognition

Strengthening user experience