Home > AI

Grok AI vs Copilot (formerly Bing Chat) — Chatbots compared

Elon Musk takes on Satya Nadella in this AI showdown

Steve Hook

Last Updated on December 14, 2023

Reviewed By: Kevin Pocock

Grok AI vs Copilot (formerly Bing Chat) — Chatbots compared

PC Guide is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Read More

Grok AI vs Microsoft Copilot (formerly Bing Chat)

AI Chatbot benchmarks and features comparison
Keep an eye on Google Gemini!

xAI vs Microsoft â€” AI chatbots comparison

Grok, the AI chatbot from Elon Musk’s xAI, has been in early access since November 4th, 2023. With public access now rolled out across the US to X Premium+ subscribers, its capabilities will be put to the test against key rivals in big tech. One such competitor is Copilot (formerly Bing Chat), Microsoft’s own AI chatbot powered by OpenAI’s GPT-4. Both AI tools boast powerful generative AI capabilities from their proprietary large language models (LLM). Crucially, however, Elon Musk gave xAI just two months to train Grok, meaning Microsoft Copilot is more mature in both a technical sense and in the AI assistant marketplace. Can xAI catch up? Let’s compare Grok AI vs Bing Chat.

Grok AI vs Microsoft Copilot (formerly Bing Chat)

Between Grok AI vs Copilot, Microsoft comes out on top! Powered by GPT-4, OpenAI’s most powerful large language model (LLM), the free and only version of Bing Chat objectively outperforms Grok-1 across four independent benchmarks.

In addition, Copilot benefits from internet access via Microsoft’s very own search engine, Bing. This gives it access to real-time data and information of current events that even ChatGPT didn’t have until quite recently. On top of all of that, Copilot has DALL·E 3 integration, known as Bing Image Creator, allowing image generation and image file output from the comfort of the ‘Bing Chat’ interface. If you can describe an image, the “new Bing” (or really DALL·E 3, credit where credit’s due) can make it!

The Grok AI tool is different from Copilot, or really any other well-known flagship chatbot, in personality. Prompts will be answered with more humor, sarcasm, even sass than Copilot, brandishing a “rebellious streak” unlike the other artificial intelligence systems we know and love(?).

Either of these chat bots comprises competent AI models. The most important distinction between either platform, right now, is accessibility. With Grok AI hidden behind a waitlist, it hardly matters how good Elon Musk’s AI assistant is until you can use it!

AI Chatbot benchmarks and features comparison

The following benchmarks each record the speed and/or accuracy of an LLM in performing a given task.

GSM8k is based on “middle school math word problems”, which are pretty easy for a human, but not so easy for a machine. Of course, a calculator would score 100% on these types of problems, but an LLM is not a calculator. In fact, a neural network does not inherently use any arithmetic, and instead ‘figures out’ these problems in a much more human way than a calculator.

MMLU (Massive Multitask Language Understanding) focuses on multidisciplinary multiple-choice questions.

HumanEval is a test designed for programming aptitude (specifically Python, but the problem-solving involved translates well to other programming languages).

MATH involves “middle school and high school mathematics problems”. Slightly more advanced for humans, but demonstrably harder for natural language processing (NLP) systems.

Benchmark	Grok-0	LLaMA 2	Inflection-1	GPT-3.5	Grok-1	PaLM 2	Claude-2	GPT-4
GSM8k	56.8%	56.8%	62.9%	57.1%	62.9%	80.7%	88%	92%
MMLU	65.7%	68.9%	72.7%	70.0%	73.0%	78%	75%	86.4%
HumanEval	39.7%	29.9%	35.4%	48.1%	63.2%	N/A	70%	67%
MATH	15.7%	13.5%	16.0%	23.5%	23.9%	34.6%	N/A	42.5%

The large language models of big tech, as benchmarked by xAI.

✓ Steve says

Keep an eye on Google Gemini!

The multimodality exhibited by these models doesn’t quite match up to the everything-everything model of Gemini in Google Bard. Despite the controversy of faked real-time footage in the launch demo, Gemini is the one to beat in terms of multimodal capability.

xAI vs Microsoft — AI chatbots comparison

Another top-down way to compare all these AI chatbots is to put all the names side-by-side. There’s already enough confusion about the distinction between chatbots and language models, which will only get more confusing with each new release. We hope you find this neat little no-nonsense table helpful in that respect!

Company	CEO	AI Chatbot	LLM	API	Open-source
xAI	Elon Musk	Grok	Grok-1	No	No
OpenAI	Sam Altman	ChatGPT	GPT-3.5, GPT-4, GPT-4V, or GPT-4 Turbo	Yes	No
Google	Sundar Pichai	Bard	Gemini (succeeded PaLM 2)	Yes	No
Microsoft	Satay Nadella	Copilot (formerly Bing Chat)	GPT-4	No	No
Meta	Mark Zuckerberg	Meta AI	LLaMA 2	No	Yes
Anthropic	Dario Amodei	Claude	Claude-2	Yes	No
Amazon	Andy Jassy	Olympus (rumored)	Olympus (rumored)	No	No

The AI chat bots of big tech.

Some of the chatbots above are accessible via a browser-based website, such as Google Bard. Others are accessible via a mobile app, such as ‘Bing Chat’ (now Copilot). Others still are accessible in both previous ways as well as via an API, such as ChatGPT.

However, on the restrictive side we have Meta AI, which is only accessible via existing non-AI-native apps. Meta AI is currently only accessible via social media apps Instagram, WhatsApp, and Facebook Messenger. This leads us to Grok, the most restricted of all, with access limited to an early access program of X Premium+ subscribers, currently limited to the US. With the new year just around the corner, Grok-2 looks likely to be a new top 3 AI bot in 2025.

About the Author