How is Natural Language Processing different from Generative AI and Large Language Models?
Much of the excitement (and concern) around the potential of AI in international development, grew in response to the release of ChatGPT-3 in late 2022. The sophistication of its comprehension, and ability to generate different kinds of human text, put the question of what AI could do, front and centre for governments, companies and citizens around the world.
In international development, this spurred action in the policy sphere to begin creating effective governance structures around the technology. Many development organisations began developing use-cases which leveraged the potential of the technology to improve the efficiency of their delivery, and directly address the needs of people around the world.
But what is ChatGPT? ChatGPT is one example of a generative, large language model (LLM). Generative AI (GenAI) refers to the subset of AI technologies which focus on generating content, whether that be text, audio, or images. This is contrary to the applications we have looked at so far which have explored the challenge of recognising patterns within data and using those patterns to make predictions. GenAI solutions are both learning the patterns in existing datasets and using that understanding to generate novel content (Toner, 2023).
-
Early models of LLMs are trained on a masking process, where one word in a sentence is hidden and the model predicts what the hidden word is most likely to be from the surrounding words (Yang et al., 2023). These models use linguistic patterns learnt from their training data to make the inference as to what the hidden word is likely to be. For example, suppose they are given the sentence “Dogs chase cats and cats chase X”. In the training data, there might be multiple sentences which express the ideas that cats commonly chase mice, among other sentences suggesting they also chase lasers, squirrels, birds, etc. By learning these patterns between words in the sentence, the model generalises to identify which word is most likely correct in the context. Through this kind of training process, LLMs can gain a strong contextual understanding of language, and achieved state-of-the-art results in a range of NLP tasks. Autoregressive Language Models (like ChatGPT) on the other hand work by predicting what the next word in a sequence will be based on the preceding words in the sentence (Yang et al., 2023).
Large language models are an example of generative AI which is trained on a huge corpus of text to form a statistical model of human language, which it can use to process and generate human-like text. Generative AI can be focused on generating things other than text, as such LLMs are just one example of GenAI (Toloka, 2023).
Large language models are a part of and demonstrate a huge advancement in the field of NLP. They can be used to perform simple NLP tasks and many other more complex tasks. Recently, a fine-tuned version of an LLM was trained to pass the bar, a very complex task involving a wide range of complex natural language processing tasks (Katz et al., 2024).
One common misconception of LLMs, in particular chatbots, is that the model is retrieving information from some knowledge base or dataset. Unless the chatbot application is using a web or database search layer, the model itself is not retrieving anything, it is simply continuously predicting the next token, or word, based on its training.
This is a brief introduction to a complicated technology, interested readers can find a more comprehensive technical introduction in Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond (Yang et al., 2023).