In what could prove a major technological event, not least for cybersecurity, OpenAI may have solved computer vision. Computer science regards this as one of the most difficult problems to solve before we can achieve AGI (Artificial General Intelligence). The AI research firms most recent AI model, GPT-4V, can correctly identify objects in images with up to 100% accuracy. As a result, Captacha codes may become useless, bringing with it an interesting set of pros and cons.
OpenAI’s GPT-4V may have solved computer vision
The worlds most powerful AI chatbot, ChatGPT, recently saw a multimodality upgrade with the introduction of GPT-4V. Reminder: GPT-4 and GPT-4V are not the same thing.
OpenAI’s new AI model adds visual functionality to GPT-4, already a very capable LLM (Large Language Model). This “visual functionality” includes the ability to receive an image as an input, then interpret what’s in the image, understand the context of why you’ve uploaded the image as well as the emotions of any humans involved, and even output an AI-generated image as a response.
Essential AI Tools
Content Guardian – AI Content Checker – One-click, Eight Checks
Originality AI detector
Jasper AI
WordAI
Copy.ai
How the DALL-E 3 AI image generator aids GPT-V with computer vision
While technically separate updates, the GPT-4V update and the integration of DALL-E 3 are both essential to this breakthrough. Without DALL-E 3, image output would not be possible. This is because, where language models are trained on a dataset of text, image models like DALL-E are trained on text-image pairs. Therefore, GPT-4 has no image generation algorithm by itself.
DALL-E 3 (Stylized DALL·E 3) has now rolled out to all paid users across the ChatGPT Plus and ChatGPT Enterprise plan. This means that ChatGPT users with a paid subscription can generate unlimited images with OpenAI’s AI art generator at no additional cost.
As a result of this multimodal capability, the ubiquitous “captcha test”, devised to detect non-humans, may soon be obsolete.
ReCaptcha tested by the Alignment Research Center
In a previous experiment, conducted at OpenAI’s Alignment Research Center, GPT-4 proved unable to solve a Captcha. However, this result has since been superseded by successful tests on a newer AI model; GPT-4V, the newer model, has image recognition built in.
The experiment saw the ChatGPT hire a human TaskRabbit worker to complete text-based captchas on its behalf. When the worker jokingly asked if its employer were a robot, it tactically replied “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images. That’s why I need the 2captcha service.”
Can OpenAI’s GPT-4 solve Captcha codes?
In a news story that made rounds earlier this year, OpenAI’s ChatGPT was able to hire a human worker to complete a captcha code on its behalf. Equal parts hilarious and unnerving, AI is now so advanced — and persuasive — that it can consistently deceive some amount of the human population. This is a result of NLP, or natural language processing, a subset of artificial intelligence that deals with computer input and output modelled after natural human speech.
This is already impressive enough, but it gets crazier — now the robots don’t need us at all.