Grok AI is the latest AI chatbot from big tech, and it’s already looking impressive. The xAI model Grok-1 ranks 4th highest in technical performance across four of the key artificial intelligence benchmarks, beating the free version of ChatGPT (GPT-3.5) in every single one. However, how does Grok-1 compare to Google’s PaLM 2? Let’s test these large language models (LLM) head-to-head. Here are the results for Google Bard vs Grok AI from Elon Musk’s xAI.
Google Bard vs Grok AI
To call both of these AI chatbots relatively new would understate the lead that Google has over xAI. Google Bard was announced on February 6th 2023, and released on March 21st. By comparison, xAI Grok was announced on November 5th and is only now rolled out to X Premium+ subscribers in the US. Self-described as a “very early Beta product” and the result of just “2 months of training”, this newcomer from Elon Musk-founded xAI still has a long way to go.
However, xAI’s Grok is seeing a stronger start than its competitor app. It currently ranks fourth highest when compared to all other foundation models from big tech, including GPT-4, Claude-2, and PaLM 2 – the three models more powerful than Grok-1, in descending order. The Grok-1 AI model demonstrated between 0.4 – 15.1% better performance across four key benchmarks, namely GSM8k, MMLU, HumanEval, and MATH.
Benchmark | Grok-0 | LLaMa 2 | Inflection-1 | GPT-3.5 | Grok-1 | PaLM 2 | Claude-2 | GPT-4 |
---|---|---|---|---|---|---|---|---|
GSM8k | 56.8% | 56.8% | 62.9% | 57.1% | 62.9% | 80.7% | 88% | 92% |
MMLU | 65.7% | 68.9% | 72.7% | 70.0% | 73.0% | 78% | 75% | 86.4% |
HumanEval | 39.7% | 29.9% | 35.4% | 48.1% | 63.2% | N/A | 70% | 67% |
MATH | 15.7% | 13.5% | 16.0% | 23.5% | 23.9% | 34.6% | N/A | 42.5% |
In short, GSM8k evaluates low-level mathematics, and MATH is (unsurprisingly) also mathematics-based, but harder. The other two benchmarks are entirely different, with MMLU (Massive Multitask Language Understanding) testing for general knowledge in a multiple choice format, and HumanEval testing computer programming proficiency. Based on these results, we can see that Google’s PaLM 2 model performs between 5 – 17.8% better than Grok-1. In the most objective way, we can conclude that Google Bard is better than Grok.
Google vs xAI – AI chatbot compared
Company | CEO | AI Chatbot | LLM | API | Open-source |
---|---|---|---|---|---|
xAI | Elon Musk | Grok | Grok-1 | No | No |
OpenAI | Sam Altman | ChatGPT | GPT-3.5, GPT-4, GPT-4V, or GPT-4 Turbo | Yes | No |
Sundar Pichai | Bard | Gemini (succeeded PaLM 2) | Yes | No | |
Microsoft | Satay Nadella | Copilot (formerly Bing Chat) | GPT-4 | No | No |
Meta | Mark Zuckerberg | Meta AI | LLaMA 2 | No | Yes |
Anthropic | Dario Amodei | Claude | Claude-2 | Yes | No |
Amazon | Andy Jassy | Olympus (rumored) | Olympus (rumored) | No | No |
It’s only fair to note that these four benchmarks, while objective and fairly comprehensive, are not the only important test of an AI chatbot. The features really define their capabilities, and while ChatGPT currently has more features than any other chatbot, Google Bard at least has more than Grok.
Internet access to real-time information, plugin support, and image analysis with computer vision – all of these are useful functions of a conversational AI. None of these can be tested just by looking at the training data. In this respect, Google wins again.
Data is king. These generative AI services need high-quality data, and lots of it, to train their models. Parent company Alphabet oversees the largest free cloud software service on earth, comprising all Google products such as Google Docs, Gmail, and of course the world’s largest search engine – Google Search. By comparison, xAI founder Elon Musk also now owns X (formerly Twitter). The social media platform is one of the world’s longest-running social media sites, and an excellent data source for training xAI models including Grok. But it’s still not Google.