Last Updated on
On the 25th September 2023, OpenAI CEO Sam Altman announced ChatGPT vision mode via the new visual model GPT-4V(ision). The AI chatbot can now take image inputs in addition to text-based prompts. As the list of capabilities of the world’s favourite artificial intelligence continues to expand, let’s take a look at ChatGPT vision mode.
What is ChatGPT vision mode?
Already being compared to Google’s Lens feature, OpenAI co-founder Sam Altman confirmed the functionality on September 25th 2023, via Twitter. ChatGPT vision mode is available right now, and is powered by the new model variant GPT-4V (also known as GPT-4 with vision). The AI chat bot can now respond to and visually analyze your image inputs. This of course includes photos, illustrations, logos, screenshots of websites and documents – ultimately these are all just JPG’s and PNG’s (the two most common digital image file types). However, while Google Lens is used as a search engine for images, and finding where those images are located on the internet, OpenAI applies a machine learning approach – analzying the contents of the image, and drawing intelligent (even conversational) conclusions from it!
Today’s announcement introduces both speech recognition and image processing algorithms. ChatGPT will be competing with Google Bard and Bing Chat from Microsoft for multimodality dominance as a result, especially now that DALL-E 3 and how that the AI image generator is integrated into ChatGPT.
Essential AI Tools
Winston AI detector
Originality AI detector
The popular GPT (Generative Pre-trained Transformer) is capable of processing natural language. Until now that has been restricted to text, but thanks to voice input (and voice recognition translating speech-to-text) the conversational AI can now have verbal conversations like a human!
Ways to use ChatGPT vision mode
To get started, tap the photo button to capture or choose an image. If you’re on iOS or Android, tap the plus button first. You can also discuss multiple images or use our drawing tool to guide your assistant.OpenAI.com
The new feature is powered by “multimodal GPT-3.5 and GPT-4”. You do not need plugins to use the new feature.
To use the feature on desktop, follow these simple steps:
Upgrade to a paid subscription tier, if you’re not already. This feature is not available to users of the free version of ChatGPT. The two paid plans are currently ChatGPT Plus and ChatGPT Enterprise.
We recommend ChatGPT Plus for most users.
Open ChatGPT via web browser, and begin a new chat with GPT-4. Then, click the + button to the left of “Send a message” in your prompt window. If you don’t see the + button, you don’t have access to the feature (see step 1).
Select which image you want to upload to ChatGPT. After you upload the image, you can also type a text prompt which instructs the AI chatbot to use the image in your desired way.
ChatGPT image upload can accept the following file types: JPG, JPEG, PNG, AVIF, and more!
GPT-4V – The GPT-4V(ision) system card
A post on the OpenAI research blog under GPT-4 safety & alignment reveals that “GPT-4 with vision (GPT-4V) enables users to instruct GPT-4 to analyze image inputs provided by the user, and is the latest capability we are making broadly available. Incorporating additional modalities (such as image inputs) into large language models (LLMs) is viewed by some as a key frontier in artificial intelligence research and development. Multimodal LLMs offer the possibility of expanding the impact of language-only systems with novel interfaces and capabilities, enabling them to solve new tasks and provide novel experiences for their users. In this system card, we analyze the safety properties of GPT-4V. Our work on safety for GPT-4V builds on the work done for GPT-4 and here we dive deeper into the evaluations, preparation, and mitigation work done specifically for image inputs.”
ChatGPT image input & image output
Until this point, chats with ChatGPT have been limited largely to text, both on the input and output side. Now we’re seeing multimodal functionality, combining the GPT-4 large language model (LLM) with image, and audio, neural networks.
With multiple AI models working together, this generative AI upgrade will allow paid ChatGPT users to ask questions about any image, as shown in a demonstrational video via the OpenAI blog.
When can you use ChatGPT vision mode?
This feature has already rolled out to all users with a ChatGPT Plus or ChatGPT Enterprise subscription. It is accessible through the browser-based version of ChatGPT, as well as the app for iOS and Android. ChatGPT vision mode works best on mobile, due to the built-in camera(s) on your device, which allow you to take photos of your environment and ask questions about it. However, it works equally well on desktop or mobile for uploading pre-existing images.
We’re rolling out voice and images in ChatGPT to Plus and Enterprise users over the next two weeks. Voice is coming on iOS and Android (opt-in in your settings) and images will be available on all platforms.OpenAI.com, more than two weeks ago
GPT-4V image input is not available in the free version of ChatGPT.