Claude 2.1 vs GPT-4 – which AI model is best in 2024?

Two of the most powerful language models of 2024

claude-2-vs-gpt-4-logos

You can trust PC GuideOur team of experts use a combination of independent consumer research, in-depth testing where appropriate – which will be flagged as such, and market analysis when recommending products, software and services. Find out how we test here.

Last Updated on

Claude 2.1 and GPT-4 are two of the most powerful AI models on the market today. Anthropic’s Claude, the underdog with very little marketing push by comparison, still ranks near the top of any benchmarking results alongside OpenAI’s top-tier flagship model. With many similarities and many differences between these large language models (LLM), which is best for you in 2024? Let’s compare Claude 2.1 vs GPT-4.

Claude 2.1 vs GPT-4 compared

Claude 2.1 is the latest AI model to power the Claude chatbot, a large language model created by Anthropic. It promises hugely increased capabilities in terms of input and output compared to its predecessor.

GPT-4, created by OpenAI, is the foundation for the hugely successful ChatGPT. This model is the flagship version of ChatGPT. Both models operate by using natural language processing and reinforcement learning. Both can solve complex problems and respond to written prompts with human-like text, with outputs ranging in form depending on the input. Furthermore, you can purchase both applications for personal or business use via API.

These things are often misconstrued because Anthropic has elected to use the same name for both its chatbot and model. This means that Anthropic’s AI chatbot, called Claude, uses the AI model Claude 2.1. By comparison, OpenAI’s chatbot, called ChatGPT, uses the AI model GPT-4. To avoid this confusion, the Claude model is often stylized as claude-2 or claude-2.1 -hyphenated and without capitalization. ChatGPT also uses other AI models depending on your payment plan, but let’s not overcomplicate things for now.

So how do these valuable tools differ? Keep reading for our comparison of these two generative AI options.

Key similarities

Both of these models are top-tier NLP (natural language processing) models. This means that they’re the same kind of artificial intelligence. Furthermore, they both represent a significant leap from their previous versions.

The LMSYS Chatbot Arena Leaderboard is a public resource that ranks all of the most popular AI models against each other in a public vote. On the list of 58 models in total, Claude 2.1 ranks #10, whereas the most recent iterations of GPT-4 rank #1 and #2. In this respect, there is a clear winner, but in benchmark data released by Elon Musk’s xAI (which includes data on both Claude 2 and GPT-4), they are absolutely in the same ballpark.

To clear up any confusion when looking at this table yourself, GPT-4 has many slight variations. When looking into the model in greater detail, as in the OpenAI API, you’ll find several different versions of what most people will collectively refer to as GPT-4, and all of them are in active use at the same time. A new one being announced does not automatically make the last immediately redundant. Some previous versions are, for example, lower cost to run (completion costs), but less powerful.

As a result, you’ll find gpt-4-0125-preview, gpt-4-1106-preview, gpt-4-1106-vision-preview, gpt-4 and gpt-4-32k available via the ChatGPT API. This also explains GPT-4-0125-preview, and GPT-4-1106-preview at the top of the user experience leaderboard.

Notable differences

Claude 2.1GPT-4
AI ChatbotAI model
Focus on ethics and safetyThe most powerful foundation model as voted by users
No plugins or integrationsHundreds of plugins and integrations
Free to useFree to use via Microsoft Copilot. Requires a paid subscription to use via ChatGPT.
Claude 2.1 vs GPT-4

Quality of output

GPT-4 is a model with 1.76 trillion parameters, using the Mixture of Experts architecture. It combines multiple models, each with 220 billion parameters. GPT-4 improves its output by generating 16 iterations, each improving on the last. The advanced capabilities and plugins that come with GPT-4, accessible via ChatGPT Plus, make it suitable for various applications that Claude 2.1 is not.

Detailed information about the architecture of Claude 2.1 is limited. Regardless, the quality of output of GPT-4 will on average exceed other AI models like Claude 2.1 because almost no other firm can match the billions of dollars in funding OpenAI has for R&D. This is especially true of the GPT-4 Turbo variant.

Anthropic have used the term ‘constitutional AI’ to describe the safety principles the model runs off.  This means it uses a set of ‘principles‘ that prioritize AI safety, which have roots in the UN’s Declaration of Human Rights and platform guidelines from companies like Apple. This means (in theory) a higher degree of safety and accuracy in the output of the chatbot.

Claude 2.1 reportedly scores higher in the bar exam, as well as in GRE writing and the Python coding test. 

However, there are obviously some downsides to this new language model. It hasn’t been in use as long as OpenAI’s ChatGPT and the GPT-4 system. This means it performs lower in some tests and exams. Furthermore, it has been reported to make some clear factual errors on occasion.

Pricing and conversational context

Claude 2.1 is comparatively cheaper than GPT-4, costing $11 per million tokens. It has a context window of 100k – far higher than GPT-4’s 32k. The new model’s strengths lie in its ability to take in and understand very large amounts of text, up to 75,000 words. This means that it can summarize entire books. It has also been praised for its superior performance with maths and coding.

The interfaces through which you’ll use them (Claude and ChatGPT) both offer free-forever options. However, should you choose to upgrade, US subscribers will pay $20/month either way.

GPT-4 vs Claude 2.1 – The verdict

GPT-4 is the established reigning champion of the AI language model world. It performs excellently in response to written or image-based prompts. Due to the large amounts of data and communication it draws upon, it shines when answering questions, telling stories or solving complex problems – it can write complicated essays, jokes, code and more. 

Although it was not as strong as Claude 2.1 when it came to GRE writing, GPT-4 outperformed it in terms of verbal and qualitative tests. Furthermore, GPT-4 does not always check the accuracy of the information it uses in its responses. 

Gloria is a tech and AI writer for PC Guide. She is interested in what new technology means for the future of digital and broadcast journalism.