Krafton launches Orak AI gaming benchmark with support for 12 games across six genres
Table of Contents
When it comes to benchmarking AI models, there are few options available. Sure, there are the likes of Geekbench AI to run various tests, but what about gaming? Well, Krafton, the publisher behind inZOI, has announced the release of the Orak benchmark, a piece of software designed to evaluate how LLM and VLM models perform in various games.
Offering a range of 12 games across six different genres, it aims to broaden its test range. With categories spanning action, adventure, RPG, simulation, strategy, and puzzle games, with different designs and functions, it’s a surprising range of games to work with any sort of AI.
Opting for a range of known games tests the AI’s capabilities and how it reasons about the outcomes. Then, scoring seven different categories out of three, each LLM receives a total score that gives it a ranking and comparison on the Orak leaderboard. All of which is outlined in the paper co-authored by 16 people in a collaboration between Krafton, Seoul National University, Nvidia, and the University of Wisconsin-Madison.

Integration and utilization
Krafton has made it fairly easy to use its AI benchmark; it has a GitHub page with the download and instructions on how to use it. All it does is install each game with the included executables and run the scripts provided, unless you’re using a commercial model with which you can use an API key instead.
The games it supports are a mix of paid and free games, split across six genres.
Deals season is here folks, and Amazon has already kickstarted its early Black Friday deals! We'll be covering all the best deals in more details over in our deals hub, but if you haven't got time to read through those, why not see our top picks below.
- ASUS TUF NVIDIA RTX 5080 Was $1599 Now $1199
- ASUS TUF RTX 5070 Ti Was $999 Now $849
- Samsung Odyssey OLED G6 Was $899 Now $649
- TCL 43S250R Roku TV 2023 Was $279 Now $199
- iBUYPOWER Y40 Gaming PC Was $2,299 Now $1,819
- Samsung Odyssey G9 (G95C) Was $1,299 Now $777
- Alienware Area-51 gaming laptop Was $3,499 Now $2,799
- Samsung 77-inch OLED S95F Was $4,297 Now $3,497
- ASUS ROG Strix G16 Was $1,499 Now $1,199
*Prices and savings subject to change. Click through to get the current prices.
| Game | Genre |
|---|---|
| Street Fighter III | Action |
| Super Mario | Action |
| Ace Attorney | Adventure |
| Her Story | Adventure |
| Pokémon Red | RPG |
| Darkest Dungeon | RPG |
| Minecraft | Simulation |
| Stardew Valley | Simulation |
| StarCraft II | Strategy |
| Slay the Spire | Strategy |
| Baba Is You | Puzzle |
| 2048 | Puzzle |
All of these have a baseline that the benchmark is targeting and outlined in the supplementary material in the paper. For example, in Stardew Valley, it’s about maximising profit earned in a 13-day period in-game and comparing it to the expert human score.
So, with each game and model going through this method, the benchmark is able to provide a standard comparison of the logic of the AI games, which is where the leaderboard comes in. Showing off the already tested models, the number one spot is taken by Gemini 2.5 Pro, with GPT 4th in second place. But that’s for average ranks, as the performance in each game varies a lot between models.