How ChatGPT is trained – generating responses from data

How language models like ChatGPT are created

Image shows the ChatGPT logo on a black background below the PC guide logo

You can trust PC GuideOur team of experts use a combination of independent consumer research, in-depth testing where appropriate – which will be flagged as such, and market analysis when recommending products, software and services. Find out how we test here.

Last Updated on

OpenAI’s chatbot, ChatGPT, has made waves within the world of AI generation tools. Whether you are a seasoned machine learning engineer or just a curious ChatGPT fan, you may be wondering, how is ChatGPT trained? The AI language model has become wildly popular in a short period since its launch, arguably due to its vast understanding of a range of topics. Users can access and discuss a large scope of subjects with the chatbot, making it an incredibly versatile platform. In this article, we’ll dive into the training process of ChatGPT and take a look at the vast dataset used in its system.

Quick Answer

ChatGPT is a large language model that uses information sourced from pubically available internet pages, licensed by OpenAI from third parties, and provided by users or human trainers. ChatGPT is trained using reinforcement learning from human feedback, where human trainers use a process called supervised fine-tuning to ensure the chatbot provides human-like response.

How ChatGPT is trained – generating responses from data

First things first, ChatGPT uses an AI model called GPT (Generative Pre-trained Transformer). Specifically, free users currently get GPT-3.5, whereas paid accounts have access to the more powerful GPT-4 model. Both versions are large language models (LLM) with the latter being built on a more recent dataset of labeled and unlabelled data from the internet. As the name suggests, LLMs are huge and are built using billions of parameters. They are a type of deep-learning algorithm that can understand and generate text, according to the data on which it was trained. These sorts of models have a wide range of uses from creating chatbots, acting as search engines, and generating creative content such as lyrics and stories.

The data that ChatGPT is trained on is known as the training data which is selected by human trainers. Without going into the bias inherent in that process, the knowledge base allows layers of context when responding to user queries like language translation, which often doesn’t call for the most literal interpretation. Indeed, use cases like this benefit a great deal from the subjectivity of the natural language processing (NLP) format seen in ChatGPT. ChatGPT’s training data allows it to make predictions about the next word in a sequence of words. This produces human-like text and relevant responses to user prompts.

Machine learning models like ChatGPT all require a large data set of examples to work from. After being trained, it can then act sensibly on new data that it has never seen before. As a result of the deep learning LLM taking input prompts in natural human language, many users ask if those same queries are used to train the neural network. The answer is yes, but you can choose to opt-out.

Essential AI Tools

Editor’s pick
Only $0.00019 per word!

Content Guardian – AI Content Checker – One-click, Eight Checks

8 Market leading AI Content Checkers in ONE click. The only 8-in-1 AI content detector platform in the world. We integrate with leading AI content detectors to give unparalleled confidence that your content appear to be written by a human.
EXCLUSIVE DEAL 10,000 free bonus credits

Jasper AI

On-brand AI content wherever you create. 100,000+ customers creating real content with Jasper. One AI tool, all the best models.
TRY FOR FREE

WordAI

10x Your Content Output With AI. Key features – No duplicate content, full control, in built AI content checker. Free trial available.
TRY FOR FREE

Copy.ai

Experience the full power of an AI content generator that delivers premium results in seconds. 8 million users enjoy writing blogs 10x faster, effortlessly creating higher converting social media posts or writing more engaging emails. Sign up for a free trial.
TRY FOR FREE

Writesonic

Create SEO-optimized and plagiarism-free content for your blogs, ads, emails, and website 10X faster. Start for free. No credit card required.

ChatGPT’s training process

The answer lies in transformer architecture. The GPT model, in all its iterations, is based on an AI architecture called transformer tech. GPT-3.5 was fine-tuned so it could interact with its users in a conversational format. Let’s explore what this fine-tuning looked like!

According to OpenAI, ChatGPT is trained using reinforcement learning from human feedback (RLHF). Initially, the model went through a process called supervised fine-tuning, where OpenAI trainers played the role of both a human user and an AI bot. Through this, the trainers created a dialogue sequence to emulate how humans communicate, which was then added to the model’s dataset to fine-tune it for conversational uses. You could argue that humans learn in quite a similar way.

OpenAI states:

“We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses. We mixed this new dialogue dataset with the InstructGPT dataset, which we transformed into a dialogue format”

OpenAI – Introducing ChatGPT

ChatGPT was later improved by creating a reward model to be used for the next step – reinforcing learning. This involved AI trainers interacting with the tool to generate responses. The outputs were then graded from best to worst, based on quality. With this information, OpenAI could further fine-tune the model using its own technique called Proximal Policy Optimization. If you are looking for details on this process, OpenAI covers it on its blog.

ChatGPT’s dataset

As mentioned above, ChatGPT is an LLM trained on a vast dataset. But where does this data come from? Knowing where ChatGPT’s data comes from is an important step in understanding the potential for bias in the chatbot’s responses. The system of gathering ChatGPT’s data is threefold. The model gathers publically available information from the internet, information licensed by OpenAI from third parties, and information that users or human trainers provide.

The information ChatGPT gathers from the internet can be in any format, including websites, books, news articles, and journals. OpenAI only uses publically available information and does not seek data behind paywalls or from the dark web. You may be wondering about the language restrictions found on ChatGPT, and how that correlates with the data it’s trained on. Well, OpenAI makes sure that the data it gathers abides by its guidelines and regulations, by implementing filters when searching. Additionally, any information deemed unsuitable is removed from its findings, for example, hate speech, adult content, spam, and sites that rely mainly on personal information.

ChatGPT then uses this information to create associations between words and phrases, instead of it being kept in a database. The model then uses these associations to predict and generate new words in response to user prompts.

An informational dialogue box explaining the data sources for Chat GPT's knowledge base, showcasing how it was trained.
An informational dialogue box explaining the data sources for Chat GPT’s knowledge base, showcasing how it was trained.

Can ChatGPT access data in real time?

In addition to the dataset that ChatGPT is trained on, the GPT-4 model can now access information in real time through its handy browser tool. In September 2023 OpenAI added the Browse with Bing feature to GPT-4, meaning internet access is available for ChatGPT Plus, Team, and Enterprise members. This makes ChatGPT’s dataset even more vast and updated with recent information. If you’d like to find out more about this topic, check out our ChatGPT internet access guide.

Notification about Chat GPT's browsing capability, which has been trained to process instructions on how users can request information or perform web searches.

Final thoughts

Grasping the training process of AI language models, like ChatGPT, can be very perplexing. But knowing exactly how your favorite chatbot has accessed the information it is responding to you with, is an important step in knowing how ChatGPT works. ChatGPT achieves its human-like response through a process known as reinforcement learning from human feedback. This process ensures that ChatGPT is using the information that it has learned to respond to users in a way that is easily digestible. This is just one of the many factors that make this chatbot so popular and easy to use.

We hope this article has given you an insight into how OpenAI created this infamous language model. If you’re interested in the full capabilities of the world’s most popular AI chatbot, why not read about who created it, and who owns OpenAI?

Funmi joined PC Guide in November 2022 and has a knowledge of AI apps, gaming and consumer technology.