Home > AI

Patronus AI – LLM benchmarking and evaluation tool

What is Patronus AI?

Reviewed By: Kevin Pocock

Last Updated on September 15, 2023
Patronus AI: Guide to understanding this advanced artificial intelligence evaluation and LLM testing platform.
PC Guide is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Read More
You can trust PC Guide: Our team of experts use a combination of independent consumer research, in-depth testing where appropriate - which will be flagged as such, and market analysis when recommending products, software and services. Find out how we test here.

Patronus AI is an artificial intelligence testing and benchmarking platform for large language models (LLMs). It scores model performance based on real world scenarios, allowing corporate clients such as Fortune 500 companies to choose the best model for their specific use cases. It is used for generating test cases for the evaluation of models, specifically the reliability and accuracy of LLMs like ChatGPT. High profile companies are showing interest in adoption of the evaluation platform – and here’s why.

What is Patronus AI?

Patronus AI is an industry-first service for the testing of safe and scalable artificially intelligent systems.

The service officially launched on September 14th 2023, via a press release. It represents “the first automated evaluation and security platform that helps companies use large language models (LLMs) safely. Using proprietary AI, the new platform enables enterprise development teams to score model performance, generate adversarial test cases, benchmark models and more,” continues the statement.

“Patronus AI automates and scales the manual and costly model evaluation methods prevalent in the enterprise today, enabling organizations to confidently deploy LLMs while minimizing the risk of model failures and misaligned outputs.”

Press release via PRNewswire

The launch of the LLM evaluation tool follows a $3M seed round “led by Lightspeed Venture Partners with participation from Factorial Capital, the CEO of Replit Amjad Masad, Gokul Rajaram and a number of other Fortune 500 executives and board members.” Early adopters, and corporate partners, include Cohere, Nomic AI and Naologic.

Essential AI Tools

Editor’s pick
Only $0.00019 per word!

Content Guardian – AI Content Checker – One-click, Eight Checks

8 Market leading AI Content Checkers in ONE click. The only 8-in-1 AI content detector platform in the world. We integrate with leading AI content detectors to give unparalleled confidence that your content appear to be written by a human.
EXCLUSIVE DEAL 10,000 free bonus credits

Jasper AI

On-brand AI content wherever you create. 100,000+ customers creating real content with Jasper. One AI tool, all the best models.
TRY FOR FREE

WordAI

10x Your Content Output With AI. Key features – No duplicate content, full control, in built AI content checker. Free trial available.
TRY FOR FREE

Copy.ai

Experience the full power of an AI content generator that delivers premium results in seconds. 8 million users enjoy writing blogs 10x faster, effortlessly creating higher converting social media posts or writing more engaging emails. Sign up for a free trial.
TRY FOR FREE

Writesonic

Create SEO-optimized and plagiarism-free content for your blogs, ads, emails, and website 10X faster. Start for free. No credit card required.

How does the AI evaluation tool work?

The Patronus platform automates the evaluation process by benchmarking AI models on standardized tests. The tests are generated at scale with adversarial test suites. These are generative adversarial networks – AI creating problems that the AI itself finds difficult to solve. This learning of its own flaws is also how adversarial AI attacks are generated by hackers. The tested models are then scored based on performance, using key criteria such as the likelihood of hallucinations and adherence to safety guidelines.

This adaptive test generation allow the AI to be exposed to potential issues as it would in real world scenarios. It also means that the designers of the AI models cannot give themselves an unfair advantage at completing the tests, because the tests aren’t known beforehand.

How can companies benefit from Patronus AI?

Anand Kannappan’s previous experience with responsible machine learning frameworks at Meta Labs contributes to his role as CEO of Patronus AI by overseeing AI model creation in a safe and sustainable way.

In other words, businesses know they need AI to get ahead – but most have no clue where to start. Legal precedents are being set for the responsible use of AI by the Frontier Model Forum, and businesses will have to learn how to adhere. Most companies entered 2023 with no internal expertise in artificial intelligence, and that’s where external guidance from Patronus comes in.

PRNewswire reports that “Patronus AI was founded by machine learning experts Anand Kannappan and Rebecca Qian. Prior to Patronus AI, Rebecca led responsible NLP research at Meta AI, and Anand pioneered explainable ML frameworks at Meta Reality Labs. They founded the company after experiencing firsthand the difficulties of evaluating AI outputs, and recognized early on that LLM evaluation would become a massive challenge for enterprises.” [sic]

“Every company is looking for ways to use LLMs today, yet they are concerned that unexpected model behavior, incorrect outputs and hallucinations will put their business and customers at risk. Whether off-the-shelf, open-source or custom, models today remain inadequately vetted and tested in real-world scenarios. And until now, the process of evaluating LLMs has been extremely inefficient and unscalable, producing unreliable results.”

Anand Kannappan, CEO and co-founder, Patronus AI

Steve is the AI Content Writer for PC Guide, writing about all things artificial intelligence. He currently leads the AI reviews on the website.