Home > AI

Grok AI vs Copilot (formerly Bing Chat) — Chatbots compared

Elon Musk takes on Satya Nadella in this AI showdown

Reviewed By: Kevin Pocock

Last Updated on December 14, 2023
Grok vs Copilot (formerly Bing Chat), leading AI chatbots powered by Grok-1 and GPT-4, compared.
PC Guide is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Read More
You can trust PC Guide: Our team of experts use a combination of independent consumer research, in-depth testing where appropriate - which will be flagged as such, and market analysis when recommending products, software and services. Find out how we test here.

Grok, the AI chatbot from Elon Musk’s xAI, has been in early access since November 4th, 2023. With public access now rolled out across the US to X Premium+ subscribers, its capabilities will be put to the test against key rivals in big tech. One such competitor is Copilot (formerly Bing Chat), Microsoft’s own AI chatbot powered by OpenAI’s GPT-4. Both AI tools boast powerful generative AI capabilities from their proprietary large language models (LLM). Crucially, however, Elon Musk gave xAI just two months to train Grok, meaning Microsoft Copilot is more mature in both a technical sense and in the AI assistant marketplace. Can xAI catch up? Let’s compare Grok AI vs Bing Chat.

Grok AI vs Microsoft Copilot (formerly Bing Chat)

Between Grok AI vs Copilot, Microsoft comes out on top! Powered by GPT-4, OpenAI’s most powerful large language model (LLM), the free and only version of Bing Chat objectively outperforms Grok-1 across four independent benchmarks.

In addition, Copilot benefits from internet access via Microsoft’s very own search engine, Bing. This gives it access to real-time data and information of current events that even ChatGPT didn’t have until quite recently. On top of all of that, Copilot has DALL·E 3 integration, known as Bing Image Creator, allowing image generation and image file output from the comfort of the ‘Bing Chat’ interface. If you can describe an image, the “new Bing” (or really DALL·E 3, credit where credit’s due) can make it!

The Grok AI tool is different from Copilot, or really any other well-known flagship chatbot, in personality. Prompts will be answered with more humor, sarcasm, even sass than Copilot, brandishing a “rebellious streak” unlike the other artificial intelligence systems we know and love(?).

Either of these chat bots comprises competent AI models. The most important distinction between either platform, right now, is accessibility. With Grok AI hidden behind a waitlist, it hardly matters how good Elon Musk’s AI assistant is until you can use it!

AI Chatbot benchmarks and features comparison

The following benchmarks each record the speed and/or accuracy of an LLM in performing a given task.

GSM8k is based on “middle school math word problems”, which are pretty easy for a human, but not so easy for a machine. Of course, a calculator would score 100% on these types of problems, but an LLM is not a calculator. In fact, a neural network does not inherently use any arithmetic, and instead ‘figures out’ these problems in a much more human way than a calculator.

MMLU (Massive Multitask Language Understanding) focuses on multidisciplinary multiple-choice questions.

HumanEval is a test designed for programming aptitude (specifically Python, but the problem-solving involved translates well to other programming languages).

MATH involves “middle school and high school mathematics problems”. Slightly more advanced for humans, but demonstrably harder for natural language processing (NLP) systems.

BenchmarkGrok-0LLaMA 2Inflection-1GPT-3.5Grok-1PaLM 2Claude-2GPT-4
GSM8k56.8%56.8%62.9%57.1%62.9%80.7%88%92%
MMLU65.7%68.9%72.7%70.0%73.0%78%75%86.4%
HumanEval39.7%29.9%35.4%48.1%63.2%N/A70%67%
MATH15.7%13.5%16.0%23.5%23.9%34.6%N/A42.5%
The large language models of big tech, as benchmarked by xAI.
✓ Steve says

Keep an eye on Google Gemini!

The multimodality exhibited by these models doesn’t quite match up to the everything-everything model of Gemini in Google Bard. Despite the controversy of faked real-time footage in the launch demo, Gemini is the one to beat in terms of multimodal capability.

xAI vs Microsoft — AI chatbots comparison

Another top-down way to compare all these AI chatbots is to put all the names side-by-side. There’s already enough confusion about the distinction between chatbots and language models, which will only get more confusing with each new release. We hope you find this neat little no-nonsense table helpful in that respect!

CompanyCEOAI ChatbotLLMAPIOpen-source
xAIElon MuskGrokGrok-1NoNo
OpenAISam AltmanChatGPTGPT-3.5, GPT-4, GPT-4V, or GPT-4 TurboYesNo
GoogleSundar PichaiBardGemini (succeeded PaLM 2)YesNo
MicrosoftSatay NadellaCopilot (formerly Bing Chat)GPT-4NoNo
MetaMark ZuckerbergMeta AILLaMA 2NoYes
AnthropicDario AmodeiClaudeClaude-2YesNo
AmazonAndy JassyOlympus (rumored)Olympus (rumored)NoNo
The AI chat bots of big tech.

Some of the chatbots above are accessible via a browser-based website, such as Google Bard. Others are accessible via a mobile app, such as ‘Bing Chat’ (now Copilot). Others still are accessible in both previous ways as well as via an API, such as ChatGPT.

However, on the restrictive side we have Meta AI, which is only accessible via existing non-AI-native apps. Meta AI is currently only accessible via social media apps Instagram, WhatsApp, and Facebook Messenger. This leads us to Grok, the most restricted of all, with access limited to an early access program of X Premium+ subscribers, currently limited to the US. With the new year just around the corner, Grok-2 looks likely to be a new top 3 AI bot in 2024.

Steve is the AI Content Writer for PC Guide, writing about all things artificial intelligence. He currently leads the AI reviews on the website.