A significant problem in international development is an inequality in influence between different actors in the ecosystem.

Often major donor agencies and implementers have a significant role in shaping the priorities and practice of development, with NGOs based in-country, grassroots organisations, and the people that programmes are designed to benefit have less power to shape the sector. 

As we’ve explored throughout the module, one of the major challenges with AI is its potential to perpetuate and lock in these biases. In many cases, there is a disparity between data-rich and data-poor organisations which mirrors the inequalities of influence in the sector at large. When developing LLMs you need a lot of data to build a working use case, which means we run the risk of only creating tools which are available to and based on the perspective of big international development actors.

Bias

One of the ways the challenge of bias manifested itself in the tool DevelopMetrics built for ITR was the fact that their tool was trained exclusively on USAID data, and consequently only presented USAID perspectives. A suggestion that a particular intervention was successful or unsuccessful didn’t necessarily indicate that an intervention was or wasn’t worth implementing. Rather, it suggested that a consensus has been reached by a group of professionals with a similar background, working in a similar context, with similar professional experiences.

DevelopMetrics approach was a combination of mitigation, education and recognition:

  • Mitigating where the data could be trained differently to prevent certain perspectives from being sidelined. 

  • Educating users to understand the biases of the tool so that they took them into careful consideration when using the tool to make decisions.

  • Recognition of where the data simply wasn’t available to raise the profile of specific perspectives in the tool. 

Some of the approaches to mitigating bias that we’ve explored throughout the module, such as finding augmented or synthetic data, weren’t appropriate in this context. A synthesised dataset, which aimed to replicate the perspective of different actors through the data they did have available, would have created a misinformation challenge and potentially glossed over the challenge of bias rather than address it. The misinformation risk arises because they needed to have a direct connection between the data and the results of users’ searches. With synthetic data, it can become unclear whether the data being drawn on comes from a genuine source or part of the synthesised dataset. With a synthetic dataset, you are not necessarily creating data which genuinely represents the views of the perspective you’re trying to emphasise, but rather an approximation of that perspective based on what data you already have. Ultimately, it's important to recognise biases and their influence.

Creating a tool responsibly involves being honest about those biases, and ensuring people who use the tool recognise where that bias lies.  

DevelopMetrics suggested, that ultimately, the best solution to these challenges is that we should support primary data collection to address our current data gaps. By working with organisations and the people whose perspectives are underrepresented to create data which reflects their priorities and experience, we'll be able to build tools which are more useful, less biased, and fairer.  

Strengthening user experience  

  Another related challenge is ensuring that users can recognise and address these biases in how they make use of LLMs. Dell’Acqua et al. raise the challenge that the development of LLM capability has been rapid, but uneven; performance in certain tasks has progressed much faster than other adjacent tasks, creating what they call the “jagged technological frontier” (Dell'Acqua et al., 2023).

Users need to be able to understand this frontier, otherwise, we run the risk of people “falling asleep at the wheel” and assuming that an LLM will perform a task well because it performed well on a similar task (Dell’Acqua, 2022). This challenge is exacerbated with a fine-tuned model given that the data it is trained on, and the way it’s been fine tuned, will create a unique frontier.  

In the context of the case study, DevelopMetrics focused on educating users to understand the technical limitations of the tool. Encouraging them to be vigilant in how they used to the tool, and to engage their own experience in assessing the quality of responses. They focused on helping users spot potential biases in the assessments of the success of an intervention. By training people to recognise that the tool’s assessment of the success of an intervention, didn’t necessarily reflect its potential impact, they encourage users to take a critical approach to using the tool; drawing on their own expertise to analyse responses. If you combine this training with effective feedback loops, such that users can report bias, then you create a more transparent system which makes the “jagged frontier” visible and easier to navigate.