The final piece in our technical learning journey: LLMs for international development

This is the final piece in a three-part blog by Olivier Mills of Baobab Tech, a Frontier Tech implementing Partner

Pilot: Using LLMs as a tool for International Development professionals

 

Part 3: UI Development, Future Directions, and Applications

In the previous part, we discussed retrieval systems, models and prompt engineering. In this final article, we'll explore UI development, consider future directions for optimization and scaling, and examine potential applications of our LLM system in international development.

6. UI development with generative frameworks

Our journey in UI development began with Streamlit for proof of concept, implementing chat RAG over aggregated content. However, we quickly realized the limitations of that framework for building a dynamic more custom and user friendly interface, including authentication, caching, rate limiting and edge functions, so we transitioned to Next.js with Vercel's AI SDK.

We're now exploring cutting-edge technologies, particularly the use of React Server Components with streaming UI. This approach allows LLM-generated data to build User Interface components (like follow-up questions or information cards) that can be streamed similarly to text generated from an LLM. The result is a more responsive interface, crucial for enhancing User Experience (UX).

UI component showing a search and selection step in the user flow

Building interactive features

Developing interactive features for our LLM application involved several key considerations:

  1. Designing intuitive interfaces for various user tasks and complex data

  2. Implementing dynamic visualizations for portfolio analysis

  3. Balancing AI-generated and user-controlled UI elements

  4. Implementing streaming responses and dynamic UI updates

  5. Addressing challenges in state management for AI interactions

One of the biggest challenges we face is determining what to store in the UI state and how to manage the flow of data from retrieval to UI, deciding what data should be persisted. This decision-making process is heavily influenced by use cases.

Most LLM applications follow a conversation thread storage approach, typically an array of alternating user and assistant messages, including tool messages that use function calling to store results. While this works well for fact-searching, it's less suitable for portfolio development or managing large result sets as storing these large result sets in the message threads creates significant data duplication and processing challenges. We can still use the thread approach, but careful state management is crucial.

Consider a portfolio-type use case where a user wants to find all FCDO programs with digital development or humanitarian components. A hybrid search could easily retrieve 140 to 200 activities/programmes, each with at least 500 words of metadata and 5 to 12 associated documents. In this scenario, we're not doing naive RAG to extract a single fact, so dumping the top 5 results into an LLM context for summarization doesn’t meet user needs.

Instead, we need to store basic information from results (like title and ID) in our persistent storage. This allows us to retrieve data for the UI when that specific instance is accessed and process the larger text chunks for results only when needed as needed.

7. Future directions and performance optimization

As we continue to build our LLM application, several key areas emerge as focal points for improvement and expansion. Central to these considerations are performance optimization and scaling strategies, particularly as we anticipate handling increasingly large-scale IATI datasets and thousands of monthly users.

7.1 Scaling strategies for large-scale datasets

The expansion to larger datasets brings with it a host of challenges, primarily centered around cost and infrastructure. As we scale, we must carefully consider the following aspects:

1.Cost management:

  • Anytime we use a large language model (i.e. multiple times for a single request instances in an agentic system) there is a cost.

  • We need to develop strategies to minimize the frequency and scope of LLM generations without compromising the quality of insights.

2. Infrastructure optimization:

  • Our current use of PostgreSQL’s Full-Text Search (FTS) feature using tsvector, while effective might not scale with a larger datasets.

  • We’re exploring the use of stored FTS fields to improve performance, but this introduces data storage challenges, which needs to balance with compute time during inference.

  • A potential solution is migrating to a search-optimized database like Elasticsearch. This could allow us to scale to millions of records effectively, with optimizations for both sparse and dense vector searches, but would affect some of our current customizations.

  • Vector embeddings for dense vector search should shift to binary and quantized formats to drop the storage needs by 50x and speed to 100x.

Caching mechanisms:

For tasks requiring large result-set analysis and contextualized summaries, we face a significant challenge. These tasks often necessitate running the language model against large summaries or sub-retrieval systems for 40 to 100 activities per query. At scale, this approach becomes cost-prohibitive and time-consuming.

To address this, we're developing an advanced caching mechanism:

1.Query embedding caching:

  • We generate and store dense vector embeddings for each unique query.

  • These cached embeddings allow for quick similarity comparisons with future queries.

2. Activity summary caching:

  • We pre-generate and cache comprehensive summaries for each activity in our dataset.

3. Contextualized summary caching:

  • When processing a new query, we first check for similar cached queries using vector embeddig comparisons.

  • If a high-similarity match is found, we retrieve pre-generated contextualized summaries for relevant activities.

  • For new queries, we generate and cache new contextualized summaries.

This approach significantly reduces the need for repeated, costly LLM operations. Instead of performing 40 to 100 RAG operations for each new query, we can often retrieve pre-generated summaries instantly with significant cost savings, and improved scalability. However, we continue to refine this system to balance cache size, data freshness, and efficient resource utilization.

7.2 Advanced retrieval and processing techniques

As we scale, we're also exploring more sophisticated retrieval and processing techniques:

1.Hierarchical retrieval:

  • Implementing a tiered approach where initial broad searches are refined through subsequent, more focused queries.

  • This could help manage the computational load while maintaining result quality.

2. Adaptive query processing:

  • Developing systems that can dynamically adjust query strategies based on the nature of the request and the current system load.

  • This could involve switching between different RAG approaches or adjusting the depth of searches based on query complexity.

3. Distributed processing:

  • As dataset sizes grow, we're considering distributed processing architectures to handle large-scale data operations more efficiently such as parallel LLM inference and clustered databases.

7.3 Towards an Agentic system

Rebuilding our LLM application using an agentic flow at its core could significantly enhance its flexibility and scalability to accommodate a wider range of user use cases. Here's a conceptual overview of how we might restructure the system:

  1. Agent orchestration: Instead of a fixed RAG pipeline, we'd implement an agent orchestrator (like LangGraph) to manage the flow of operations. This would allow for dynamic decision-making based on user queries and data characteristics.

  2. Task decomposition: An initial agent would break down complex queries into subtasks. For instance, a query about "nutrition programs in drought-affected regions of East Africa" might be decomposed into separate tasks for identifying drought-affected regions, finding nutrition programs, and analyzing their effectiveness.

  3. Specialized agents: We'd create a suite of specialized agents for different tasks:

    • Data Retrieval Agent: Handles efficient database queries and caching.

    • Analysis Agent: Performs in-depth analysis on retrieved data including batch analysis

    • Summary Agent: Generates contextualized summaries.

    • Visualization Agent: Creates dynamic data visualizations.

  4. Tool integration: Agents would have access to a toolkit including database operations, statistical analysis functions, and visualization tools. They could dynamically select and use these tools based on task requirements.

  5. Memory and state management: A shared memory system would allow agents to store and retrieve intermediate results, reducing redundant operations and enabling more complex, multi-step analyses, including persistant storage for revisiting previous user sessions

  6. Feedback loop: The system would incorporate user feedback to refine its responses and adjust its approach in real-time, making it more adaptable to user needs.

This agentic approach would allow our system to handle a much broader range of queries with greater flexibility. It could autonomously navigate complex analytical pathways, potentially uncovering insights that a more rigid system might miss. However, it would require careful design to manage increased complexity and ensure coherent, reliable outputs.

8. Potential applications and future vision

The learnings and technologies we've developed have broad applicability, not just within IATI data and international development but across various sectors dealing with large-scale document analysis and decision support.

Our work lays the foundation for a future where AI significantly enhances decision-making in international development:

  1. Data-driven policy formulation:

    • AI systems could analyze vast datasets to identify trends and patterns, informing more effective policy decisions.

  2. Cross-sector, cross-funder collaboration:

    • AI systems could identify synergies between different development sectors and funders, fostering more integrated and holistic approaches to complex challenges.

  3. Personalized development strategies:

    • Tailoring development approaches to the specific needs and contexts of different regions or communities based on comprehensive data analysis.

  4. Enhanced transparency and accountability:

    • AI-driven systems could provide clearer, more accessible insights into the use and impact of development funds, enhancing trust and accountability in the sector.

Building a community of practice

As we continue to develop and refine our LLM application, we recognize the immense value of collaboration and knowledge sharing within the international development and tech communities. We are eager to connect with others working on similar applications, to exchange ideas, share learnings, and collectively advance the field of AI-assisted decision-making in development.

Follow our Frontier Tech blog to get updates and email Jenny at jenny.prosser@dt-global.com


If you’d like to dig in further…

🚀 Explore this pilot’s profile page

📚 Learn more about the idea behind the pilot here

📚 Read part one of the blog here

📚 Read part two of the blog here

Frontier Tech Hub
The Frontier Technologies Hub works with UK Foreign, Commonwealth and Development Office (FCDO) staff and global partners to understand the potential for innovative tech in the development context, and then test and scale their ideas.
Previous
Previous

LLMs to improve decision making in development - what do future users want?

Next
Next

The next stage of LLMs: retrieval systems, models and prompt engineering