LLMs to improve decision making in development - what do future users want?
A blog by Robert Phillips, a Frontier Tech Pioneer
Pilot: Using LLMs as a tool for International Development professionals
I was surprised to hear Tony Blair on a niche AI podcast given guests are mostly tech experts; but the former PM seemed keen to land this message:
“I am constantly saying to my own party, the Labour Party… you've got to focus on this technology revolution. It's not an afterthought. It's the single biggest thing that's happening in the world today. It’s of a real world nature and is going to change everything.”
Behind this optimistic boosterism there’s the obvious question - how can AI practically help? This was the question that Sebastian Mhatre and I were keen to answer as part of our Frontier Technology pilot “Using LLMs as a tool for International Development professionals.”
Straight out of the box these tools are impressive - last month Claude helped me fix my shower. I took a few photos, then it stepped me through the solution - no plumber needed. Moreover AI tools are constantly improving and it’s too early to tell where their ceiling will be. Notwithstanding all of this, the most advanced models can provide limited support for our day-to-day needs.
This doesn’t mean that these models are useless, but they need to be adapted, at least for now. How best to adapt them depends on exactly what you want the model to do, which brings us to user testing. Throughout this pilot we are working closely with FCDO staff, to bring together an understanding of what functionality is needed and what the technology can do.
This blog post is a summary of the first round of user engagement (by the time this blog post is published, we’ll have conducted a second round focused on hands-on testing of our LLM-based tool - stay tuned for this on the Frontier Tech website).
Our user needs group was comprised of 12 participants from the FCDO who work on development. Here's what we learned:
The problems today
When it comes to using evidence to make decisions, walking the walk is a lot harder than talking the talk. Our conversations with practitioners (and review of research conducted by FCDO’s Research and Evidence Directorate) revealed some common challenges:
Problem 1 - information overload
There are two challenges here. Firstly, it’s difficult to find the right information. Even with progress in information management, identifying which documents have the information you want is hard - and often comes down to luck. Secondly, if you do manage to identify all the relevant documents, there can be so many that it’s impractical to read them all. Unless someone has already answered the exact question you’re interested in - through a protracted and costly research piece which i) isn’t the same as looking at the original source and ii) you may not be able to find - you end up relying on the sources you know.
Problem 2 - the evidence culture
Organisational incentives often conflict with evidence-based decision-making. This tension emerged as a significant theme in participant discussions. Time pressure to deliver results at pace comes at the expense of thorough evidence analysis, exacerbating information overload and groupthink, can hinder a critical examination of evidence. This can stifle the exploration of new approaches and can inadvertently encourage "policy-based evidence" seeking, where evidence is selectively used to support predetermined conclusions. These norms can make it difficult for staff to buy the time they need to challenge received wisdom or contradict established views; creating an environment where evidence-based decision-making is often compromised, despite its recognised importance.
Wishlist for an AI Assistant
After talking through the problems we spent some discussing how AI can help. We ended up with a long list of functions that people thought could help them address the challenges above and deliver more effectively. These divide into:
Portfolio analysis. Users are keen to quickly paint a picture of what FCDO programming looks like in a particular thematic area or geography. To understand what programming looks like across themes like climate or localisation, geographies, delivery mechanisms or any other dimension There was interest in a tool which would identify a long list of all relevant FCDO programmes as well as one that was more discerning, identifying key programmes and summarising lessons.
Information extraction and synthesis: Users wanted a tool that would extract key information from a set of programmes, for example lessons learnt, best practices, or what works. It could also extract programme effectiveness information including identifying indicators and outcomes and also identify potential synergies between program partners, facilitating collaboration opportunities.
Tailored Communication: Functionality could produce summaries and descriptions of programmes tailored to different audiences. It might adjust content and length based on the target audience's needs and background. It could also help to reframe problems and support teams to think more creatively about solutions.
Critical friend/quality assurance: A tool could act as a critical friend, offering feedback. For example providing comments on a theory of change or a section of a business case. It could provide constructive criticism, identify potential improvements, and challenge assumptions, thereby enhancing the quality and effectiveness of programmes or strategies.
While we’re keen to address all four of these areas, we’ve initially focussed on the first two and are working with FCDO colleagues in the department to develop a tool that could address actual day-to-day problems they have had.
Beyond what it does - how it does it
Perhaps the most important feedback was that AI-powered tools should empower staff. Users want AI assistants that can handle the heavy lifting of information processing, freeing their time to focus on analysis, interpretation, and decision-making. They also want to know the outputs from AI models are trustworthy. We’ve covered the steps we’ve taken to make the model accurate and robust in our technical blog series, but just as important is that people feel confident in using it: that they understand why it’s produced the results that it has, the providence of information, how they can be confident in the quality of outputs and how they should interpret the results. User groups also warned of the risk that LLMs could exacerbate some of the cultural problems - making it too easy to rely on tools to find evidence that justifies a policy position or that corroborates entrenched views.
Understanding how we can improve user interactions, testing solutions and sharing our learning is a central part of the Frontier Technologies programme. It’s why we’re publishing these blogs! Our progress to date has made transparency a key aspect of the user interface - for example ensuring that responses cite and link to original (and trusted) sources. We’re also designing the user interface to help users build higher quality prompts to help them get better results and are exploring options that highlight how relevant a response may be to a question. Later down the line we’re keen to investigate ways to improve model evaluation as well as how best to develop models that provide a challenge to users to stave off the risks outlined above.
The Road Ahead
As we move forward with our pilot, these user insights will be our north star. We'll be testing, iterating, and most importantly, continuing to listen to our users every step of the way.
Ultimately, we want what we discover to be useful to a wider community of practice for international development. If you want to get in touch to share learnings or collaborate, reach out to jenny.prosser@dt-global.com.
If you’d like to dig in further…
🚀 Explore this pilot’s profile page
📚 Learn more about the idea behind the pilot here
📚 Begin reading our technical journey here about the challenges behind building the platform