Assessing the use case
The need for a nuanced understanding of domain-specific language guided the process through which DevelopMetrics built each use case for their tool. Working with ITR, they started with the question: ‘what do we do?’, and through a series of discussions with the department, they co-developed a framework for understanding the work that the department carried out. Exploring questions like:
What are the most important concepts to our work and how do we define them?
What are the relationships between these concepts, and how does that relate to what we’re trying to achieve?
What’s the best way to categorise different aspects of our work?
This co-creation process draws together everyone’s technical expertise to build something which is reflective and inclusive of various views and perspectives. As such, this framework gave the team a solid foundation from which to build a tool which was sensitive to the department’s work, and the way they used key concepts.
Additionally, the record of this process gave them a useful resource for evaluating the tool and educating users on the best strategies for using it. For example, if users thought the tool wasn’t capturing certain aspects of what they meant by a concept like digital development, they could return to the framework and consider why. This provides a level of transparency with the tool because they can trace the details of how it performs back to a source.
Data collection
The value of a tool like this is that it allows organisations to understand the information they have stored in their databases. The data collection process involves consulting those organisations to understand which databases are most relevant to the use-case, and the ways in which they want those resources to be pulled together. For many organisations, understanding the information that’s in just one of their databases can be incredibly useful for guiding their work. By triangulating that data with different sources, you’re able to add another level of richness to the information the tool provides.
Data processing and training
When considering a tool like this, it is useful to distinguish between the training phase for DevelopMetric’s base model, DEELM, and the way they train a tool for specific use cases.
The process of building DEELM to better grasp the complexities of language used by international development professionals involved five years of consultation with experts at the UN and USAID, and a lot of manual tagging. The team would go through development reports and manually tag specific sections of those reports with relevant concepts. For example, they would look through a report related to the humanitarian sector and highlight specific sections of the document which related to resilience. They would then engage sector experts and academics to verify that their tagging process was accurately capturing the nuances of what the text was discussing. This tagging process was then peer-reviewed by other experts, to ensure that it was comprehensive.
This is a supervised learning approach, which like many of the examples we’ve explored in this report, involves training an AI system on labelled data to process patterns between that data and the categories used to classify it.
When developing a new fine-tuned version of DEELM the team faced a challenge. They have two key datasets. The first is their baseline training data - the manually tagged data that they used to train the DEELM base model. The second dataset is that which they want to categorise according to their co-designed framework. The challenge is to fine-tune DEELM’s ability to discern relationships between their base training data and generic categories to the use case-specific data according to the categorisation laid out in the co-designed framework.
Building the model
The main challenge is getting the first phases right. With a representative, well-curated dataset grounded in a clear conceptual framework, it is much easier to build the other parts of the solution. After having done this on ITR’s programme data, they were able to design the interface which allowed users to search through the information categorised by the LLM and to add different functions, like the auto-generation of briefs on specific topics.
Testing and Iterating
One of the use cases for the tool developed for ITR was assessing the success of different interventions. Assessing the potential impact of different interventions is an essential part of what international development professionals do. With limited budgets, it is incredibly important that we get this right and ensure resources are being used to sustainability impact communities for the better.
One challenge with the testing process in this context was that there was no specific answer they expected the tool to give. When asking for a brief on where blockchain had been shown to have the most impact in the Southeast Asian context, there is no correct answer which can be used as a baseline to compare against the response given by the LLM. Therefore, the testing process has to leverage the expertise of professionals in the sector to understand the blind spots and biases of the tool (we’ll explore the question of bias in more depth later in the case study).
To test the use case, the team gave the tool to digital development specialists and had them ask it various questions. They would then record the responses and share qualitative feedback on how effectively the tool was responding to their requests. DevelopMetrics then used this feedback to iterate the tool and retrain it to address the problems identified by the experts. Suppose the tool was putting more weight on the idea of innovation as a technological process and ignoring aspects of innovation related to people. The team might identify that this was down to a lack of data on the latter aspects of innovation, and look for more data related to that aspect of innovation. In other cases, they needed to revisit their training process.
“One challenge with the testing process in this context was that there was no specific answer they expected the tool to give”
Impact
Through these iterations, the team developed a tool which was able to provide ITR with a contextually relevant, information-rich overview of their work in digital development. This overview saved them an immense amount of time finding and preparing evidence documents, such as the briefs to Congress. With this timesaving, the different teams within ITR could attend to the higher-level task of evaluating how to use that evidence to inform their work, and better meet the challenge of building technology use cases which improve people’s lives.