Machine learning systems can identify patterns in data and use those patterns to make predictions.
There are three main types of machine learning: supervised learning, unsupervised learning, and reinforcement learning.
Supervised learning
IIn 2019, one of our FCDO colleagues saw that, despite South African miners being entitled to recompense if they had developed TB or silicosis, very few people were receiving payments. The issue was a massive backlog of X-ray scans, and too few doctors available to read them. In collaboration with the University of British Colombia, we developed an AI tool which was capable of distinguishing healthy X-rays from those that needed to be checked for respiratory diseases.
In supervised machine learning, you start with a training set of labelled data. In this case, a set of X-rays which were correctly labelled by a doctor as either: silicosis present, TB present, or neither present. The system is fed this data and trained to identify patterns underlying the matching of the data with the label. As it learns these patterns, it adjusts its underlying algorithms such that the inputs are accurately matched with the correct labels. Once the model is trained, users can feed in new data which was not part of the training set and the model can correctly identify the output.
Unsupervised learning
Unsupervised learning involves a machine learning application identifying patterns within a dataset and using those patterns to uncover the structures within the data. It does this without being explicitly trained by humans to identify what those structures will be. The key difference is that the training data in unsupervised learning is not labelled, and as such the system does not have a preset range of expected outputs to organise the data.
One use-case of unsupervised learning is clustering. Clustering involves the classification of data based on similarities and differences present in data. Suppose you have a database of information about different customers' spending habits (e.g. the frequency of their purchases, the kinds of products they buy, and the amount they spend). The unsupervised machine learning application will look for clusters of similarity between the datapoints and use them to create organic groupings. The system might spot a similarity between certain customers who make frequent, large purchases, and others who make infrequent large purchases. These clusters can then be used by the company to classify customers into specific groups, without explicitly training the system to categories data into specific categories (IBM, nd).
Reinforcement learning:
One of the most famous examples of reinforcement learning comes from DeepMind’s AlphaGo. In 2016, AlphaGo shocked computer scientists by beating a world champion in the ancient game of ‘Go’ – a territory-based board game. In ‘Go’, players make sequential decisions in a dynamic environment (the choice of optimal moves changes based on where the other player goes) and under uncertainty (players don’t know where the other player is going).
In reinforcement learning, an AI system is placed in an environment – in our example, the starting position for a game of ‘Go’. It then makes a decision in that environment which is either rewarded or punished. The rewards relate to actions that the developers identify to be the goal of the system, such as making an optimal move. The system is trained to try and receive as many rewards as possible. Over time, as it receives punishments and rewards, it gradually learns to pick sequences of decisions that optimise its rewards and minimise its punishments (Bhatt, 2018). AlphaGo simulated numerous games of Go, following various decisions until it ultimately identified strategies by which it could get the highest score on its reward function.
AlphaGo’s success was not just due to RL alone—it combined RL with Monte Carlo Tree Search (MCTS), a method that evaluates potential future moves by simulating multiple possible game outcomes. This allowed AlphaGo to strike a balance between exploring new strategies and exploiting known high-reward moves. Through millions of simulated games, AlphaGo discovered innovative, non-human strategies, demonstrating the power of reinforcement learning in complex decision-making environments.
Overfitting and Underfitting
Underfitting and overfitting are two common challenges in AI model training that affect accuracy and generalisation. Underfitting occurs when a model is too simplistic, failing to capture the underlying patterns in the training data. This results in poor performance on both the training data and new, unseen data. Overfitting, on the other hand, happens when a model learns patterns that are too specific to the training data, including noise and irrelevant details, making it perform well on the training set but poorly on new data. For example, if a model is overfitted, it might recognize specific images of dogs but fail to generalize to all breeds.
Striking a balance—often through techniques like regularisation, cross-validation, and using diverse training data—is crucial to developing AI models that generalise well and provide reliable outputs in real-world applications.