How the Large Language Model, ChatGPT, may take over tech

ChatGPT is the newest hot topic in the tech industry and is taking the world by storm. In this article we give an intro to how ChatGPT works. Find out more.

Humanoid robot looking at mathematical blackboard

ChatGPT is the newest hot topic in the tech industry, taking the world by storm.

But it’s important to know precisely what ChatGPT is and how it works in order to be best prepared for your next project, LinkedIn post, or family get-together with older relatives.

So, we’ll explore how ChatGPT works.

ChatGPT is a generative large language model.

Let’s look at each of those terms in turn, starting with “language model”.

A language model is an algorithm that attempts, in some way, to manipulate language – creating language, summarising language, or anything else of that sort.

A “large language model” is a computerised language model that runs on an artificial neural network – meaning a series of algorithms inspired by how the neurones in our brains fit together and work – to achieve this goal of manipulating language. [1]

Finally, “generative” means that ChatGPT’s specific way of manipulating language works by attempting to predict the text that best follows a sequence of provided text or a prompt.

So, there we have it: ChatGPT is a computer system that uses specific algorithms to predict text.

Now that we have a deeper understanding of the what let’s examine the how.

Disassembled robot head with two humans holding a large brain

To begin with, ChatGPT works with language tokens. This might seem like a novel concept, but we just used language tokens to describe ChatGPT.

Language tokens are the constituent parts of text-based communication that may be easier for a chatbot to deal with. These could be singular words, groups of words, or smaller parts of words like prefixes and suffixes.

In our examination above, we looked at the tokens “language model”, “large”, and “generative”. These tokens together conveyed a specific meaning, with each token interacting with the other to create a unique meaning.

ChatGPT was trained by the people at OpenAI with a bunch of corpuses, which are large collections of text. As an example of a corpus, Cornell researchers compiled a corpus of thousands of lines of movie dialogue here.

ChatGPT was trained on corpuses, including the Wikipedia body of text, among others, and broke these corpuses down into combinations of tokens. The neural network inside ChatGPT then looked at patterns that occur in these lines of tokens and tried to use statistics to apply those patterns when creating new text.

And that’s it!

Essentially, ChatGPT is a very large guesser, using lots of fancy algorithms to guess what it thinks should follow whatever prompt you typed in based on what it has already read.

An interesting consequence of this is maths. Whilst computers are famously excellent at calculations, if you ask ChatGPT to perform operations on any reasonably big numbers, it will struggle immensely to get even remotely close to the right answer.

This is because ChatGPT doesn’t actually know what is in the prompt. There is nothing that understands what has been written. It is just trying to guess what will appear next. So, when operating on large numbers, the huge variability (change one digit in the middle of a number, and the result will change wildly) means that ChatGPT’s predictive model simply breaks down; and it provides a seemingly random, incorrect answer.

It should be noted that this issue can be overcome with external functions and arbitrary algorithms, e.g. through the recently introduced plugins and function calling capabilities.

But this isn’t to say that ChatGPT isn’t powerful!

ChatGPT is an incredibly advanced system that has uses now and in the future. This technology already disrupts several industries, including software development, music, photography, video, and content production, creating controversy and alarm but also substantial creative momentum.

Some even argue that it’s more powerful than we currently think.

Please just remember this explainer so that you can make the best use of it when you need it (or finally produce a coherent answer to Grandpa Dave ;)).

[1] For the more technical readership, Large Language Models use specific methods of representing language, its constituents, and the relationships between them, with some examples being deep neural networks, Bayesian inference, and probabilistic inference. ChatGPT uses geometric terms to create a probability distribution over language with words as vectors and sentences manipulated via large matrices.