Home > News

Krafton launches Orak AI gaming benchmark with support for 12 games across six genres

AI is now playing games for us as Krafton tests its capability in them
Last Updated on
Krafton launches Orak AI gaming benchmark with support for 12 games across six genres
PC Guide is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Read More

When it comes to benchmarking AI models, there are few options available. Sure, there are the likes of Geekbench AI to run various tests, but what about gaming? Well, Krafton, the publisher behind inZOI, has announced the release of the Orak benchmark, a piece of software designed to evaluate how LLM and VLM models perform in various games.

Offering a range of 12 games across six different genres, it aims to broaden its test range. With categories spanning action, adventure, RPG, simulation, strategy, and puzzle games, with different designs and functions, it’s a surprising range of games to work with any sort of AI.

Opting for a range of known games tests the AI’s capabilities and how it reasons about the outcomes. Then, scoring seven different categories out of three, each LLM receives a total score that gives it a ranking and comparison on the Orak leaderboard. All of which is outlined in the paper co-authored by 16 people in a collaboration between Krafton, Seoul National University, Nvidia, and the University of Wisconsin-Madison.

ORAK benchmark overview. It shows the 12 games running through the MCP interface with the LLM agents also using that. They meet in the evaluator which is then evaluated and added to the leaderboard.
ORAK benchmark overview, source: Krafton

Integration and utilization

Krafton has made it fairly easy to use its AI benchmark; it has a GitHub page with the download and instructions on how to use it. All it does is install each game with the included executables and run the scripts provided, unless you’re using a commercial model with which you can use an API key instead.

The games it supports are a mix of paid and free games, split across six genres.


Deals season is here folks, and Amazon has already kickstarted its early Black Friday deals! We'll be covering all the best deals in more details over in our deals hub, but if you haven't got time to read through those, why not see our top picks below.

*Prices and savings subject to change. Click through to get the current prices.


GameGenre
Street Fighter IIIAction
Super MarioAction
Ace AttorneyAdventure
Her StoryAdventure
Pokémon RedRPG
Darkest DungeonRPG
MinecraftSimulation
Stardew ValleySimulation
StarCraft IIStrategy
Slay the SpireStrategy
Baba Is YouPuzzle
2048Puzzle

All of these have a baseline that the benchmark is targeting and outlined in the supplementary material in the paper. For example, in Stardew Valley, it’s about maximising profit earned in a 13-day period in-game and comparing it to the expert human score.

So, with each game and model going through this method, the benchmark is able to provide a standard comparison of the logic of the AI games, which is where the leaderboard comes in. Showing off the already tested models, the number one spot is taken by Gemini 2.5 Pro, with GPT 4th in second place. But that’s for average ranks, as the performance in each game varies a lot between models.

Orak benchmark video

About the Author

With a fascination for technology and games, Seb is a tech writer with a focus on hardware, news, and deals. He is also a tester and reviewer for the site. Contact him @ [email protected]