The first step in training an AI system is gathering the right data. Training data helps the system recognise patterns in images that it will need to classify. Large image databases, like ImageNet, can be used for training, but the dataset must be representative of real-world conditions for the system to be effective.

For LUMs, the challenge was that many images of fires were taken from phone cameras at ground level, while their system used cameras mounted on watchtowers with above-ground perspectives. Training the system with ground-level images wouldn’t have been effective for their real-world application.

The data which trains AI systems needs to be “closely aligned with the purpose for which the AI system is being developed” so that it has the same properties as the real-world phenomena it models (Michel, 2023). The challenge of unrepresentative datasets is typical and acute when applying AI in international development; the real world is messy, complex, and full of variation.

Many of the international datasets collating information about global health and livelihoods for example, which could be useful in an international development context, are curated for decision-making purposes and not to train AI (Sekera et al, 2023). This means that a lot of work must be done to ensure that they are processable by an algorithm. In adition, access to such datasets can be limited, as they are often spread across multiple national and international organisations with complicated and varied licensing rules.

The challenge is not insurmountable. In many cases, there may be the option to collect more data or to adapt existing datasets to make them better suited to the use case that you’re developing. The solution that the LUMs pilot team found to address this challenge however was to look for images drawn not from real life, but from videogames. The games Red Dead Redemption (Rockstar Games, 2010) and Grand Theft Auto IV (Rockstar Games, 2013) allow players to start forest fires and customise a camera view of those fires. LUMs were able to create perspectives similar to those of the cameras they were deploying in real life, gather a large number of images which were lifelike enough to effectively train their computer vision system to classify images and detect fires.

This is known as creating a synthetic dataset. It’s a common strategy for addressing the challenge of an unrepresentative dataset. You can find out more about synthetic data by reading “Synthetic Data for Speed, Security and Scale” (Lucini, 2021).

Training the system

With an initial dataset, the LUMs team began training their system. Artificial Neural Networks (ANNs) are computational systems made up of multiple layers of interconnected nodes. These nodes process input data to produce a desired output. Each node performs a specific function on its input, and the resulting output is passed through successive layers until it reaches the final output layer. You can think of ANNs as a complex, multi-step algorithm. For their work, the LUMs team used an off-the-shelf Convolutional Neural Network (CNN) called YOLO (You Only Look Once), a specialized type of ANN that achieves state-of-the-art results in many computer vision tasks (O'Shea and Nash, 2015).

As previously mentioned, adapting an algorithm to meet the unique requirements of a real-world use case is challenging. In this case, the adaptation required LUMs to train the model with their synthetic dataset and fine-tune its algorithms to better align with the specific use case. This customized version of YOLO enabled the LUMs team to detect the presence of fires in images and pinpoint the exact regions of the images where the fires were located.

Deploying, testing, and adjusting to the real world

Fine-tuning is not a static process, but rather a continuous process of gradual improvements to a model over time. After deploying the cameras in the forest, several challenges were still confounding the system - these are sometimes known as Edge-cases. Edge cases occur when an AI system encounters a scenario that it hasn’t been trained to handle, requiring it to make a distinction or detection beyond its prior experience (iMerit, 2022). For example, this is a frequent issue in self-driving cars, where the computer vision system may fail to recognize an unusual object on the road, leaving it unable to determine the appropriate response. In the context of the pilot, the team identified the following challenges:

During sunset, the computer vision system misidentified the orange sky as fire

Fog and haze were often misidentified as smoke

During relatively dry periods, the system would misidentify dusty and barren terrain as smoke

The system was not trained to make nuanced distinctions between the presence of morning fog as compared to smoke resulting from a fire. As a result, the system needed to be refined to address the real-world distinctions between different features present in the environment. To do this, the team retrained their model on each edge case. They curated databases with extensive images of fires as well as sunsets, fog, and dust and fed those into the algorithm. The parameters of the algorithm were adjusted through this retraining process, and the system was better able to accurately distinguish fires and smoke from other features in the environment.

Future directions

Since the breakthrough in deep learning in 2012, it has been a common practice to use models trained on ImageNet after appropriate adaption. However, in recent years large language models (LLM), specifically Large Multi-Modal models (LMMs), have shown better performance on a wide variety of tasks as compared to classical ImageNet based methods. The team is now including the power of LLMs to further improve the reliability of the system (we’ll provide a technical introduction to LLMs later in the module).

For example, the team is employing LLMs to improve the binary classification of images into ‘normal’ or ‘containing smoke or fire’. At the initial classification step, there are three LLM based binary classifications happening to improve the accuracy of the classification, such that only images that are classified to contain fire or smoke are passed through the object detector. Both the classification as well as the localization information is then updated on the dashboard. Even if the fire or smoke is not localised, the classification information is updated on the dashboard as an event of interest. This has helped in significantly reducing the occurrence of false events to almost zero during daytime.

Keep reading to discover more about some of the non-technical challenges this team faced.

Okay, but how do you build it?

Training the system

Deploying, testing, and adjusting to the real world

Future directions