Last Updated on
On Wednesday, November 22nd, OpenAI CTO Mira Murati sent a letter to employees. The letter detailed a project known internally as Q* (Pronounced Q-Star) or Q-Learning. This project was purported to be “one factor among a longer list of grievances by the board leading to Altman’s firing”, and could help accelerate the learning rate of mathematical models towards AGI (Artificial General Intelligence). So, how does Q-Learning work, and what controversy (reportedly) led to the firing of OpenAI CEO Sam Altman?
OpenAI CTO Mira Murati and the internal letter to staff
Q* and Q-Learning are trending today due to references made by OpenAI’s Chief Technology Officer, Mira Murati, on Wednesday, November 22nd. It’s expected that this technology could be an ingredient for achieving AGI, or Artificial General Intelligence. As a result, a “lack of consistently candid communication” about such a world-changing development played a part in the board’s decision to fire OpenAI CEO Sam Altman, according to an internal letter sent out by Murati to OpenAI employees.
What are project Q* and its Q-Learning algorithm?
To date, Q* and Q-Learning are being used synonymously. With very little documentation and few official references to these terms, we’re unable to definitively differentiate them. However, it’s possible that Q* is an internal project name, in reference to the optimal solution of a Bellman equation (which we’ll return to later). Q* may also be the name of a corresponding AI model yet to be announced by OpenAI, or at least a working title thereof. By contrast, Q-Learning is a mathematical concept. The Q-Learning algorithm will be a formula used in this project and AI model.
Names aside, Q-Learning refers to a formula used in a machine learning algorithm capable of “grade-school” level mathematics and is hoped to surpass OpenAI’s GPT-4 model in that field. It approaches math problems using a machine learning technique called reinforcement learning, wherein rewards are given for correct or optimal actions, and punishment is given for incorrect or suboptimal actions. Machines can learn the shortest path (shortest route) to an expected reward through exploration of all possible paths, finding a more optimal route through trial and error, and achieving an optimized state over time, making better decisions each time.
But how does this all relate to Q*? Q-values, also known as action values, allow us to put a number value on the effectiveness of a given action at a given time. Storing this value in a Q-table, alongside all other Q-values, a machine can objectively decide the effectiveness of that action, and as a result, the highest number is the most optimal solution found (so far or at a given time) by that algorithm.
Essential AI Tools
7-in-1 AI Content Checker – One-click, Seven Checks
Winston AI detector
Originality AI detector
The Bellman equation – OpenAI’s reinforcement learning algorithm for artificial intelligence
In mathematics, Q is used to denote a rational number, or “a number that can be expressed as the quotient or fraction of two integers”. OpenAI’s use of Q* may refer to the Optimal Value Function in the Bellman Optimality Equation. In other words, Q* is the optimal solution (by definition) of an efficiency optimization algorithm. It’s not hard to see how efficiency optimization relates to the work of OpenAI.
The Bellman equation is a formula that allows us (or a machine) to make the best-informed decision at each stage of a multi-stage process. Named after Richard E. Bellman, the award-winning Brooklyn-born mathematician, helps to find a solution to a complex, multi-stage problem, by making the best decision at each stage, given what is known at that given stage. The person (or computer) running the algorithm can plug in a priority, which is called the objective function, such as “minimizing travel time, minimizing cost, maximizing profits, maximizing utility” etcetera. The algorithm will then dictate the best possible actions to take to achieve the desired result.
What is a Bellman equation?
A Bellman equation may be written as Vπ(s) = R(s,a,s’) + γ ∑ P(s’|s,a) Vπ(s’)
Q* plays a role in this equation, where Q* is the ‘Optimal Function’, S stands for ‘State’ and A refers to ‘Action’ in q∗(s′, a′)
q∗(s′, a′) is a state-action pair, fundamental to any Bellman equation, when moving from a current state (or given state) to the next state in a process.
This concludes our maths lesson because I’m not the mathematician required to explain any of this.
We will surely hear more about this mysterious ‘project Q*’ in the near future (and sooner than OpenAI intended, no doubt). This is all there is to know about how OpenAI is pushing “the veil of ignorance back and the frontier of discovery forward” for Q-Learning, and its machine learning applications in AI. Perhaps Sam Altman will return with more power to reveal this secretive project soon. When OpenAI finally releases this new mathematical mode, you’ll hear about it here first!