Home > AI

Patronus AI – LLM benchmarking and evaluation tool

What is Patronus AI?

Steve Hook

Last Updated on September 15, 2023

Reviewed By: Kevin Pocock

Patronus AI – LLM benchmarking and evaluation tool

PC Guide is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Read More

What is Patronus AI?
How does the AI evaluation tool work?
How can companies benefit from Patronus AI?

Patronus AI is an artificial intelligence testing and benchmarking platform for large language models (LLMs). It scores model performance based on real world scenarios, allowing corporate clients such as Fortune 500 companies to choose the best model for their specific use cases. It is used for generating test cases for the evaluation of models, specifically the reliability and accuracy of LLMs like ChatGPT. High profile companies are showing interest in adoption of the evaluation platform – and here’s why.

What is Patronus AI?

Patronus AI is an industry-first service for the testing of safe and scalable artificially intelligent systems.

Prime Day is finally here! Find all the biggest tech and PC deals below.

*Prices and savings subject to change. Click through to get the current prices.

The service officially launched on September 14th 2023, via a press release. It represents “the first automated evaluation and security platform that helps companies use large language models (LLMs) safely. Using proprietary AI, the new platform enables enterprise development teams to score model performance, generate adversarial test cases, benchmark models and more,” continues the statement.

“Patronus AI automates and scales the manual and costly model evaluation methods prevalent in the enterprise today, enabling organizations to confidently deploy LLMs while minimizing the risk of model failures and misaligned outputs.”
Press release via PRNewswire

The launch of the LLM evaluation tool follows a $3M seed round “led by Lightspeed Venture Partners with participation from Factorial Capital, the CEO of Replit Amjad Masad, Gokul Rajaram and a number of other Fortune 500 executives and board members.” Early adopters, and corporate partners, include Cohere, Nomic AI and Naologic.

Essential AI Tools

More Deals Coming Soon!

How does the AI evaluation tool work?

The Patronus platform automates the evaluation process by benchmarking AI models on standardized tests. The tests are generated at scale with adversarial test suites. These are generative adversarial networks – AI creating problems that the AI itself finds difficult to solve. This learning of its own flaws is also how adversarial AI attacks are generated by hackers. The tested models are then scored based on performance, using key criteria such as the likelihood of hallucinations and adherence to safety guidelines.

This adaptive test generation allow the AI to be exposed to potential issues as it would in real world scenarios. It also means that the designers of the AI models cannot give themselves an unfair advantage at completing the tests, because the tests aren’t known beforehand.

How can companies benefit from Patronus AI?

Anand Kannappan’s previous experience with responsible machine learning frameworks at Meta Labs contributes to his role as CEO of Patronus AI by overseeing AI model creation in a safe and sustainable way.

In other words, businesses know they need AI to get ahead – but most have no clue where to start. Legal precedents are being set for the responsible use of AI by the Frontier Model Forum, and businesses will have to learn how to adhere. Most companies entered 2023 with no internal expertise in artificial intelligence, and that’s where external guidance from Patronus comes in.

PRNewswire reports that “Patronus AI was founded by machine learning experts Anand Kannappan and Rebecca Qian. Prior to Patronus AI, Rebecca led responsible NLP research at Meta AI, and Anand pioneered explainable ML frameworks at Meta Reality Labs. They founded the company after experiencing firsthand the difficulties of evaluating AI outputs, and recognized early on that LLM evaluation would become a massive challenge for enterprises.” [sic]

“Every company is looking for ways to use LLMs today, yet they are concerned that unexpected model behavior, incorrect outputs and hallucinations will put their business and customers at risk. Whether off-the-shelf, open-source or custom, models today remain inadequately vetted and tested in real-world scenarios. And until now, the process of evaluating LLMs has been extremely inefficient and unscalable, producing unreliable results.”
Anand Kannappan, CEO and co-founder, Patronus AI

About the Author

Steve Hook

Steve is an AI Content Writer for PC Guide, writing about all things artificial intelligence. He currently leads the AI reviews on the website.