Grok vs Claude – similarities and differences explained

They have different areas of focus, but what's similar?
Last Updated on March 19, 2024
Graphical comparison between Grok vs Claude, IntelliJ IDEA, and CLion IDE logos.
March has already been a whirlwind month for AI Chatbot news with the release of Anthropic’s new Claude 3.0 models and the open-sourcing of xAI’s entire code base for Grok into the wild. Either of these would be significant, but coming only weeks apart, they underline the relentless pace of change the AI industry is undergoing. So what does this mean for Grok vs Claude?

Claude 3.0 was released to almost entirely positive reviews and hailed as perhaps the first genuine contender to GPT-4. Its current capabilities are considerably ahead of Grok, but Grok’s arrival as an open-source download changes the shape of the landscape, placing the largest open-source model yet available into the hands of developers wanting to use it. In this article, we’ll delve into the features and capabilities of both chatbots and compare their strengths and weaknesses to see how they stack up.

Grok vs Claude – the background

Grok AI


Developed by xAI, founded by Elon Musk, Grok development has been almost as tumultuous as its founders’ recent relationship with Open AI, a business Musk helped create in 2015 before leaving in 2018.

A public falling-out has since occurred between OpenAI and Musk, with Musk currently suing his former business colleagues. Some have suggested that Grok’s current open-source status is a reaction by Musk to his perception that OpenAI is no longer an open-source research foundation but very much a closed for-profit corporation.

Launched on November 4th, 2023, Grok was integrated into X’s Premium Pro tier offering a mix of conversational chat, writing and translation ability, research assistance, and real-time access to X data. Now, with its open source code in the wild it is embarking on a new journey as perhaps the most advanced open-source AI in existence.

Key Features

  •  Powered by the Grok-1.0 a 314 billion parameter large language model Mixture-of-Experts model
  • Trained from scratch by xAI using a custom stack on top of JAX and Rust
  • Real-time knowledge of the world via the X platform
  • Unique personality with a sense of humor and a rebellious streak

Groks Open-Sourcing

  •   Base model checkpoint released under Apache 2.0 license on Sunday, March 17, 2024.
  •   Architecture and weights are available, but not fine-tuned for specific tasks
  •   Potential for collaboration and innovation within the AI community

Claude 3.0


Developed by Anthropic AI, Claude 3.0 was released on March 4th, 2024, to near-universal acclaim.  Anthropic’s benchmark data shows its most advanced (subscription-based) model ‘Opus’ outperforming Google’s Gemini Ultra models in all metrics and edging out GPT-4.0. It should be noted that the model is benchmarked against GPT-4, not GPT-4 Turbo.

Key Features

  • Three models: Haiku, Sonnet, and Opus.
  • Claude Sonnet is free to try via the website. Opus is available as a paid monthly subscription.
  • Vision capabilities: the ability to process photos, charts, and graphs
  • Code generation and language translation
  • 200K token context window, 1M tokens available for specific uses
  • Accurate over long documents
  • Enterprise-grade security and data handling 
Graphical comparison of various AI Chatbots and machine learning models across different benchmarks and skill areas.
Graphical comparison of various AI Chatbots and machine learning models across benchmarks and skill areas.

Source: Anthropic.com

Grok vs Grok Claude benchmarks comparison

It isn’t easy to compare Grok vs Claude directly since they have not been benchmarked against one another. However, by looking at the tests they have both undertaken, it’s clear that Claude is considerably ahead of Grok in every metric. This is unsurprising since Grok’s initial training was designed to compete against GPT-3.5.

Looking at the benchmarks they share we find the following:

GSM8k: This is a benchmark based on common middle school maths problems. It doesn’t sound like a difficult challenge for these vast neural networks, but it is. Neural networks rely on example pairs to learn and do not employ computational arithmetic. Despite this, significant advances are occurring, and Claude Opus is now scoring a remarkable 95%. Grok barely figures in this benchmark, with its last recorded score down at 23.9%.

MMLU: This multiple-choice test challenges how well an LLM can connect knowledge across various fields. You’d expect an LLM to handle this skill easily, but not all do.

HumanEval is a benchmark that tests programming skills in Python. The results obtained are useful in evaluating the LLM’s overall programming prowess. Grok did respectably against the base model of GPT-4 but is smoked by every Claude model, with Opus posting an impressive 84.9%.

Math: The MATH benchmark is a collection of math-based word problems that help evaluate the basic mathematical problem-solving abilities of LLMs. It requires LLMs to translate word problems into math formulas and solve them accurately. Again Claude comes in way ahead of Grok with 60.1% compared to Grok’s 23.9%.

BenchmarkClaude 3.0: OpusClaude 3.0: SonnetClaude 3.0: HaikuGrok 1.0
The MATH benchmark, featuring Claude 3.0 scores

Is Claude Better Than Grok?

Based on current evaluations, it’s Gard to argue with that conclusion. But, the two bots exist for an entirely different purpose. Clause is a highly capable general-purpose LLM amongst the most powerful commercially available. Grok, on the other hand, is currently built to act as a real-time add-on to Musk’s X service. Now that it is also open source, it is likely to be re-purposed in entirely new ways and have additional functionaries and modifications that will change the way it operates significantly.

Musk’s move to liberate Grok puts him on a similar path to Meta whose open source models like Llama 2, are popular precisely because they can be reconfigured. By adopting a similar strategy, Musk is aligning his AI hand with the open-source community at a time when the death of access to powerful AI models is set to intensify.

On the other hand, Claude is studying the exceptionally powerful and capable model that is beginning to prove to be a worthy competitor to OpenAI. Competition and open access to these technologies can only be beneficial in ensuring that safe development and open access remain firmly at the forefront of the growing debate.

