Home > News

Elon Musk’s xAI chatbot Grok is powered by a supercomputer with over 100,000 Nvidia GPUs

The brains behind Elon Musk's AI chatbot Grok
Last Updated on
Elon Musk’s xAI chatbot Grok is powered by a supercomputer with over 100,000 Nvidia GPUs
PC Guide is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Read More

Founded by Elon Musk, xAI released Grok initially back in 2023 and then followed it up with Grok-2 about a year later. The generative artificial intelligence chatbot works similarly to other chatbots but was marketed as having a sense of humor and direct access to X (formerly Twitter). Just as fascinating as Grok itself is the behind-the-scenes of this intelligent AI assistant are also extraordinary. Thanks to VentureBeat’s detailed video tour, we have a closer look at the technology powering Grok.

xAI, with the help of Supermicro and NVIDIA, is building the world’s largest liquid-cooled GPU cluster deployment to train and power the AI chatbot. This massive AI supercomputer, featuring over 100,000 NVIDIA HGX H100 GPUs, exabytes of storage, and lightning-fast networking, costs billions of dollars and is based in Memphis, TN. Even more remarkable, this entire data facility was transformed from the ground up into a production-grade AI supercomputer in just 122 days – a timeline that typically takes years of planning for facilities half its size.

A closer look at Colossus and its building blocks

The overall structure of this multi-million-dollar facility is relatively standard, featuring a raised floor data hall with power located above and liquid cooling pipes leading to the facility’s chiller below. In total, there are four compute or data halls housing approximately 25,000 NVIDIA GPUs, along with integrated storage, fiber optic high-speed networking, and power systems. However, things get more technical down the road.

Each cluster contains a Supermicro liquid-cooled rack, which includes eight Supermicro 4U Universal GPU systems. These systems feature liquid-cooled NVIDIA HGX H100 GPUs (8 per system) and two liquid-cooled x86 CPUs. Each rack comprises eight of these GPU servers, a Supermicro coolant distribution unit (CDU), and coolant distribution manifolds (CDM), which are the building blocks for a single rack.

The Colossus AI supercomputer also features groundbreaking liquid cooling techniques, with the coolant distribution manifolds (CDMs) above each server responsible for delivering cool liquid and removing warmed liquid. This setup also supports quick disconnects, enabling fast and simple removal or reinstallation of liquid cooling equipment. Each system's top tray holds the NVIDIA HGX H100 8-GPU complex, which uses cold plates to cool the GPUs and the NVIDIA HGX board. The bottom tray houses the motherboard, CPU, RAM, PCIe switches, and cold plates for the dual-socket CPUs.

What sets Supermicro's servers apart from other AI servers in the industry is their design, which is purpose-built for liquid cooling. Instead of retrofitting liquid cooling onto an air-cooled design, these servers feature custom liquid-cooling blocks from the start. This accessibility and ease of serviceability make these AI servers highly scalable. Additionally, the coolant distribution unit (CDU) includes a built-in management system for monitoring critical functions such as flow rate and temperature.

Source: Supermicro

While it might seem that a fully liquid-cooled server wouldn't need air cooling, Supermicro servers still use system fans to cool smaller components like DIMMs, power supplies, low-power baseboard management controllers, NICs, and other electronics. However, the power consumed by these fans is significantly reduced compared to traditional air-cooled servers, lowering the overall power requirements for each server.

What does this mean for Grok?

There's a lot more intricate and game-changing technology behind Colossus, including a cutting-edge networking platform that ensures fast and reliable data transfer. We recommend checking out VentureBeat’s tour video for a deeper dive into the details. From what we've gathered, it's safe to say that xAI is pushing the boundaries of supercomputing into a new era. With such colossal power driving it, the potential capabilities of Grok are poised to go above and beyond.

About the Author

Hassam boasts over seven years of professional experience as a dedicated PC hardware reviewer and writer.