Home > AI

ChatGPT vision mode – GPT-4V(ision) system card

What is ChatGPT vision mode?

Reviewed By: Kevin Pocock

Last Updated on February 1, 2024
ChatGPT Vision mode powered by OpenAI's GPT-4V multimodal AI model.
You can trust PC Guide: Our team of experts use a combination of independent consumer research, in-depth testing where appropriate - which will be flagged as such, and market analysis when recommending products, software and services. Find out how we test here.

On the 25th September 2023, OpenAI CEO Sam Altman announced ChatGPT vision mode via the new visual model GPT-4V(ision). The AI chatbot can now take image inputs in addition to text-based prompts. As the list of capabilities of the world’s favourite artificial intelligence continues to expand, let’s take a look at ChatGPT vision mode.

What is ChatGPT vision mode?

Already being compared to Google’s Lens feature, OpenAI co-founder Sam Altman confirmed the functionality on September 25th 2023, via Twitter. ChatGPT vision mode is available right now, and is powered by the new model variant GPT-4V (also known as GPT-4 with vision).

The AI chat bot can now respond to and visually analyze your image inputs. This of course includes photos, illustrations, logos, screenshots of websites and documents – ultimately these are all just JPG’s and PNG’s (the two most common digital image file types). However, while Google Lens is used as a search engine for images, and finding where those images are located on the internet, OpenAI applies a machine learning approach – analzying the contents of the image, and drawing intelligent (even conversational) conclusions from it!

Today’s announcement introduces both speech recognition and image processing algorithms. ChatGPT will be competing with Google Bard and Bing Chat from Microsoft for multimodality dominance as a result, especially now that DALL-E 3 and how that the AI image generator is integrated into ChatGPT.

Essential AI Tools

Editor’s pick
Only $0.00019 per word!

Content Guardian – AI Content Checker – One-click, Eight Checks

8 Market leading AI Content Checkers in ONE click. The only 8-in-1 AI content detector platform in the world. We integrate with leading AI content detectors to give unparalleled confidence that your content appear to be written by a human.
EXCLUSIVE DEAL 10,000 free bonus credits

Jasper AI

On-brand AI content wherever you create. 100,000+ customers creating real content with Jasper. One AI tool, all the best models.
TRY FOR FREE

WordAI

10x Your Content Output With AI. Key features – No duplicate content, full control, in built AI content checker. Free trial available.
TRY FOR FREE

Copy.ai

Experience the full power of an AI content generator that delivers premium results in seconds. 8 million users enjoy writing blogs 10x faster, effortlessly creating higher converting social media posts or writing more engaging emails. Sign up for a free trial.
TRY FOR FREE

Writesonic

Create SEO-optimized and plagiarism-free content for your blogs, ads, emails, and website 10X faster. Start for free. No credit card required.

The popular GPT (Generative Pre-trained Transformer) is capable of processing natural language. Until now that has been restricted to text, but thanks to voice input (and voice recognition translating speech-to-text) the conversational AI can now have verbal conversations like a human!

How to use the image capabilities of GPT-4 Vision

To get started, tap the photo button to capture or choose an image. If you’re on iOS or Android, tap the plus button first. You can also discuss multiple images or use our drawing tool to guide your assistant.

OpenAI.com

The new feature is powered by “multimodal GPT-3.5 and GPT-4”. You do not need plugins to use the new ChatGPT vision feature. ChatGPT Plus subscribers, and those on the business plans called Teams and Enterprise, have access to it. In addition to computer vision, you’ll also have access to the DALL-E 3 integration (the output aspect, whereas computer vision is more about the input side).

With simple text prompts, you can generate visuals right from within the ChatGPT interface. Visual content can include practically any art style you can describe. However, graphs and data visualisations, as well as medical imaging will not be reliable or accurate. In other words, it will generate roughly what a graph could look like in concept, for aesthetic purposes, but not a specific graph with specific data.

To use the feature on desktop, follow these simple steps:

Step

1

Upgrade ChatGPT

Upgrade to a paid subscription tier, if you’re not already. This feature is not available to users of the free version of ChatGPT. The two paid plans are currently ChatGPT Plus and ChatGPT Enterprise.

We recommend ChatGPT Plus for most users.

Step

2

Access GPT-4V

Open ChatGPT via web browser, and begin a new chat with GPT-4. Then, click the + button to the left of “Send a message” in your prompt window. If you don’t see the + button, you don’t have access to the feature (see step 1).

Step

3

Visual prompting

Select which image you want to upload to ChatGPT. After you upload the image, you can also type a text prompt which instructs the AI chatbot to use the image in your desired way.

ChatGPT image upload can accept the following file types: JPG, JPEG, PNG, AVIF, and more!

GPT-4V – The GPT-4V(ision) system card

A post on the OpenAI research blog under GPT-4 safety & alignment reveals that “GPT-4 with vision (GPT-4V) enables users to instruct GPT-4 to analyze image inputs provided by the user, and is the latest capability we are making broadly available. Incorporating additional modalities (such as image inputs) into large language models (LLMs) is viewed by some as a key frontier in artificial intelligence research and development. Multimodal LLMs offer the possibility of expanding the impact of language-only systems with novel interfaces and capabilities, enabling them to solve new tasks and provide novel experiences for their users. In this system card, we analyze the safety properties of GPT-4V. Our work on safety for GPT-4V builds on the work done for GPT-4 and here we dive deeper into the evaluations, preparation, and mitigation work done specifically for image inputs.”

ChatGPT image input & image output

Until this point, chats with ChatGPT have been limited largely to text, both on the input and output side. Now we’re seeing multimodal functionality, combining the GPT-4 large language model (LLM) with image, and audio, neural networks.

With multiple AI models working together, this generative AI upgrade will allow paid ChatGPT users to ask questions about any image, as shown in a demonstrational video via the OpenAI blog.

When can you use ChatGPT vision mode?

This feature has already rolled out to all users with a ChatGPT Plus or ChatGPT Enterprise subscription. It is accessible through the browser-based version of ChatGPT, as well as the app for iOS and Android. ChatGPT vision mode works best on mobile, due to the built-in camera(s) on your device, which allow you to take photos of your environment and ask questions about it. However, it works equally well on desktop or mobile for uploading pre-existing images.

We’re rolling out voice and images in ChatGPT to Plus and Enterprise users over the next two weeks. Voice is coming on iOS and Android (opt-in in your settings) and images will be available on all platforms.

OpenAI.com, more than two weeks ago

GPT-4V image input is not available in the free version of ChatGPT.

Steve is the AI Content Writer for PC Guide, writing about all things artificial intelligence. He currently leads the AI reviews on the website.