Home > AI > AI Compare

Google Gemini vs GPT-4 (ChatGPT 4)

Which is better, Gemini or GPT 4

Reviewed By: Steve Hook

Last Updated on January 31, 2024
Google and OpenAI compete with Gemini and GPT-4 AI models.
PC Guide is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Read More
You can trust PC Guide: Our team of experts use a combination of independent consumer research, in-depth testing where appropriate - which will be flagged as such, and market analysis when recommending products, software and services. Find out how we test here.

One of the big tech giants Google has released the Gemini AI model, its long-awaited answer to OpenAI’s GPT-4 LLM (Large language model). Gemini will now power Google Bard, taking over from another of Google’s proprietary AI models, PaLM 2. Could the capabilities of Openai’s ChatGPT-4 rival surpass the competition? We look at the various versions of the new multimodal AI model – Gemini Nano, Gemini Ultra, and Gemini Pro model – and how their benchmarks place Google Gemini vs GPT-4.

Is Gemini better than GPT-4?

According to Google, Gemini “represents a significant leap forward in how AI capabilities can help improve our daily lives”.

The new AI technology model also represents a significant leap in performance from previous models, as demonstrated by the benchmark results already released at launch. Gemini is off to an embarrassing start, with many details of the launch event revealed to be pre-recorded, and not real-time demonstrations as initially claimed. Despite this, the objective power of the Gemini model may win out against the marketing mishap.

Welcome to the Gemini era - Google Bard's new AI model.
Welcome to the Gemini era – Google Bard’s new AI model.

The first of these benchmark tests is called MMLU, short for Massive Multitask Language Understanding. This text benchmarks a model as it performs 57 multi-tasking-based trials “including elementary mathematics, US history, computer science, law, and more”. One of several authors of the test, Dan Hendrycks, notes an impressive gap of 20 percentage points above random chance scored by OpenAI’s GPT-3 model. Hendrycks does make the caveat that GPT-3 needed “substantial improvements before [it] can reach expert-level accuracy”.

However, as the research paper was last revised on January 21st, 2021, the model mentioned is no longer the SOTA (State-of-the-Art). GPT-4 and its new GPT-4 Turbo variant will far outperform even that. More recent testing shows that GPT-4, the foundation model from OpenAI, scored 86.4% with a 5-shot attempt.

By contrast, Gemini Ultra (pixel only) exceeds expert-level accuracy, able to score 90% on the MMLU benchmark, compared to 89.8% from a human expert. This is significant because “Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem-solving abilities of AI-generated models.” OpenAI aims to offer ethical AI development while focusing on reducing biases, promoting transparency, and ensuring safety in AI models.

Essential AI Tools

Editor’s pick
Only $0.00019 per word!

Content Guardian – AI Content Checker – One-click, Eight Checks

8 Market leading AI Content Checkers in ONE click. The only 8-in-1 AI content detector platform in the world. We integrate with leading AI content detectors to give unparalleled confidence that your content appear to be written by a human.
EXCLUSIVE DEAL 10,000 free bonus credits

Jasper AI

On-brand AI content wherever you create. 100,000+ customers creating real content with Jasper. One AI tool, all the best models.
TRY FOR FREE

WordAI

10x Your Content Output With AI. Key features – No duplicate content, full control, in built AI content checker. Free trial available.
TRY FOR FREE

Copy.ai

Experience the full power of an AI content generator that delivers premium results in seconds. 8 million users enjoy writing blogs 10x faster, effortlessly creating higher converting social media posts or writing more engaging emails. Sign up for a free trial.
TRY FOR FREE

Writesonic

Create SEO-optimized and plagiarism-free content for your blogs, ads, emails, and website 10X faster. Start for free. No credit card required.

Google Gemini vs GPT-4 benchmark results

BenchmarkGeminiGPT-4
MMLU90%86.4%
Big-Bench Hard83.6%83.1%
DROP82.480.9
HellaSwag87.8%95.3%
GSM8K94.4%92.0%
MATH53.2%52.9%
HumanEval74.4%67.0%
Natural2Code74.9%73.9%
Gemini vs GPT-4 text-based benchmark results.

As we can see from the benchmarked results, GPT-4 is better than Gemini in only one test. Google’s Gemini is better than GPT-4 in all other text-based evaluations — 7 in total!

Comparing Gemini with OpenAI’s multimodal variant, GPT-4V, extends the list of wins quite substantially. Google DeepMind’s latest artificial intelligence scores above the competition in complex tasks relating to audio and vision capabilities.

Is Gemini better than GPT-4 at coding?

Yes, Gemini beats GPT-4 at writing code. Based on benchmark data, Gemini has a very significant 7.4% lead in in HumanEval, which tests Python code generation.

Does Google Bard use Gemini?

Yes, the Gemini model is already integrated into Google Bard. It replaces PaLM 2, the predecessor model also developed by Google. Both are capable multimodal foundation models, but Gemini is the first model to beat human experts at the MML benchmark, and thereby pose a serious threat to GPT-4. With this new AI model, Google Bard is more powerful than ChatGPT at various kinds of tasks. How that will reflect in traffic and attention garnered by these two rival technologies is a different question, as OpenAI has already carved a substantial lead in AI chatbot market share.

Steve is the AI Content Writer for PC Guide, writing about all things artificial intelligence. He currently leads the AI reviews on the website.