Grok AI vs Copilot (formerly Bing Chat) — Chatbots compared

Table of Contents
Grok, the AI chatbot from Elon Musk’s xAI, has been in early access since November 4th, 2023. With public access now rolled out across the US to X Premium+ subscribers, its capabilities will be put to the test against key rivals in big tech. One such competitor is Copilot (formerly Bing Chat), Microsoft’s own AI chatbot powered by OpenAI’s GPT-4. Both AI tools boast powerful generative AI capabilities from their proprietary large language models (LLM). Crucially, however, Elon Musk gave xAI just two months to train Grok, meaning Microsoft Copilot is more mature in both a technical sense and in the AI assistant marketplace. Can xAI catch up? Let’s compare Grok AI vs Bing Chat.
Grok AI vs Microsoft Copilot (formerly Bing Chat)
Between Grok AI vs Copilot, Microsoft comes out on top! Powered by GPT-4, OpenAI’s most powerful large language model (LLM), the free and only version of Bing Chat objectively outperforms Grok-1 across four independent benchmarks.
Amazon's Spring Sale is now live!
Amazon's Spring Sale features deals on everything from the latest CPUs to high-powered gaming monitors.
- AMD Ryzen 9 9900X - 26% OFF NOW!
- SAMSUNG 990 PRO 4TB SSD - 40% OFF NOW!
- SAMSUNG 49-inch Odyssey QD-OLED Monitor - 41% OFF NOW!
- Lenovo Legion 5i RTX 4070 Gaming Laptop - 13% OFF NOW!
- WD_BLACK 8TB SN850X SSD - 32% OFF NOW!
*Stock availability and pricing subject to change depending on retailer or outlet.
In addition, Copilot benefits from internet access via Microsoft’s very own search engine, Bing. This gives it access to real-time data and information of current events that even ChatGPT didn’t have until quite recently. On top of all of that, Copilot has DALL·E 3 integration, known as Bing Image Creator, allowing image generation and image file output from the comfort of the ‘Bing Chat’ interface. If you can describe an image, the “new Bing” (or really DALL·E 3, credit where credit’s due) can make it!
The Grok AI tool is different from Copilot, or really any other well-known flagship chatbot, in personality. Prompts will be answered with more humor, sarcasm, even sass than Copilot, brandishing a “rebellious streak” unlike the other artificial intelligence systems we know and love(?).
Either of these chat bots comprises competent AI models. The most important distinction between either platform, right now, is accessibility. With Grok AI hidden behind a waitlist, it hardly matters how good Elon Musk’s AI assistant is until you can use it!
AI Chatbot benchmarks and features comparison
The following benchmarks each record the speed and/or accuracy of an LLM in performing a given task.
GSM8k is based on “middle school math word problems”, which are pretty easy for a human, but not so easy for a machine. Of course, a calculator would score 100% on these types of problems, but an LLM is not a calculator. In fact, a neural network does not inherently use any arithmetic, and instead ‘figures out’ these problems in a much more human way than a calculator.
MMLU (Massive Multitask Language Understanding) focuses on multidisciplinary multiple-choice questions.
HumanEval is a test designed for programming aptitude (specifically Python, but the problem-solving involved translates well to other programming languages).
MATH involves “middle school and high school mathematics problems”. Slightly more advanced for humans, but demonstrably harder for natural language processing (NLP) systems.
Benchmark | Grok-0 | LLaMA 2 | Inflection-1 | GPT-3.5 | Grok-1 | PaLM 2 | Claude-2 | GPT-4 |
---|---|---|---|---|---|---|---|---|
GSM8k | 56.8% | 56.8% | 62.9% | 57.1% | 62.9% | 80.7% | 88% | 92% |
MMLU | 65.7% | 68.9% | 72.7% | 70.0% | 73.0% | 78% | 75% | 86.4% |
HumanEval | 39.7% | 29.9% | 35.4% | 48.1% | 63.2% | N/A | 70% | 67% |
MATH | 15.7% | 13.5% | 16.0% | 23.5% | 23.9% | 34.6% | N/A | 42.5% |
Keep an eye on Google Gemini!
The multimodality exhibited by these models doesn’t quite match up to the everything-everything model of Gemini in Google Bard. Despite the controversy of faked real-time footage in the launch demo, Gemini is the one to beat in terms of multimodal capability.
xAI vs Microsoft — AI chatbots comparison
Another top-down way to compare all these AI chatbots is to put all the names side-by-side. There’s already enough confusion about the distinction between chatbots and language models, which will only get more confusing with each new release. We hope you find this neat little no-nonsense table helpful in that respect!
Company | CEO | AI Chatbot | LLM | API | Open-source |
---|---|---|---|---|---|
xAI | Elon Musk | Grok | Grok-1 | No | No |
OpenAI | Sam Altman | ChatGPT | GPT-3.5, GPT-4, GPT-4V, or GPT-4 Turbo | Yes | No |
Sundar Pichai | Bard | Gemini (succeeded PaLM 2) | Yes | No | |
Microsoft | Satay Nadella | Copilot (formerly Bing Chat) | GPT-4 | No | No |
Meta | Mark Zuckerberg | Meta AI | LLaMA 2 | No | Yes |
Anthropic | Dario Amodei | Claude | Claude-2 | Yes | No |
Amazon | Andy Jassy | Olympus (rumored) | Olympus (rumored) | No | No |
Some of the chatbots above are accessible via a browser-based website, such as Google Bard. Others are accessible via a mobile app, such as ‘Bing Chat’ (now Copilot). Others still are accessible in both previous ways as well as via an API, such as ChatGPT.
However, on the restrictive side we have Meta AI, which is only accessible via existing non-AI-native apps. Meta AI is currently only accessible via social media apps Instagram, WhatsApp, and Facebook Messenger. This leads us to Grok, the most restricted of all, with access limited to an early access program of X Premium+ subscribers, currently limited to the US. With the new year just around the corner, Grok-2 looks likely to be a new top 3 AI bot in 2025.