Is GPT-4 multimodal?

Table of Contents
It’s official! OpenAI’s newest GPT model is here. If you are anticipating what new features this model could have, you may be wondering – Is GPT-4 multimodal?
You’ll be happy to hear that OpenAI has confirmed that GPT-4 will be multimodal. In other words, it will be able to cleverly process images and text from its users, generating a response to any question you may have.
Prime Day is finally here! Find all the biggest tech and PC deals below.
- Sapphire 11348-03-20G Pulse AMD Radeon™ RX 9070 XT Was $779 Now $739
- AMD Ryzen 7 7800X3D 8-Core, 16-Thread Desktop Processor Was $449 Now $341
- ASUS RTX™ 5060 OC Edition Graphics Card Was $379 Now $339
- LG 77-Inch Class OLED evo AI 4K C5 Series Smart TV Was $3,696 Now $2,796
- Intel® Core™ i7-14700K New Gaming Desktop Was $320.99 Now $274
- Lexar 2TB NM1090 w/HeatSink SSD PCIe Gen5x4 NVMe M.2 Was $281.97 Now $214.98
- Apple Watch Series 10 GPS + Cellular 42mm case Smartwatch Was $499.99 Now $379.99
- ASUS ROG Strix G16 (2025) 16" FHD, RTX 5060 gaming laptop Was $1,499.99 Now $1,274.99
- Apple iPad mini (A17 Pro): Apple Intelligence Was $499.99 Now $379.99
*Prices and savings subject to change. Click through to get the current prices.
This was also confirmed by Microsoft’s Germany CTO last week. When revealing details about GPT-4, Andreas Braun, exposed that the long-term investor of OpenAI will soon “have multimodal models that will offer completely different possibilities”.
So, what can GPT-4 actually do? Well, there are claims online that the new model can generate captions and descriptions tailored to the image you input into the model – the perfect assistant for any social media task.
But, that’s not all. Some users have even found that GPT-4 can recommend recipe ideas based on an image of any ingredients you have left lying around.
GPT-4 will be “more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5”. And you can definitely see evidence of this in the model’s scarily high exam scores.
Shockingly, GPT-4 manages to perform better than 90 percent of humans on the Uniform Bar Exam. That’s pretty impressive, to say the least.
At the moment, the model is only available to Chat GPT Plus members and to those who have passed through OpenAI’s API waiting list.
What is a multimodal model?
So, what even is a multimodal model anyway? If a model is multimodal this refers to its ability to operate using multiple mediums which could include video, images, or audio.
For example Microsoft’s latest model, Kosmos-1 can reportedly perform visual text recognition, find specific content from images and even solve visual puzzles. The fact that this model can take in information in the form of images and can output a response in another, is what makes it multimodal.
OpenAI has already developed its own multimodal model, DALL-E. This revolutionary AI tool can construct images based on text written by humans.
DALL-E is a sophisticated artist and has shown to produce some extremely eye-catching images just using a few prompts.
Final Thoughts
So is GPT-4 multimodal? Yes, it is! OpenAI has been working hard to develop next-level AI technology and it seems that all the hard work paid off. If you are interested in finding out more about GPT-4, we definitely recommend heading over to the OpenAI to see what this model can do.
If you found this article useful, why not read GPT-4 release date: when is the new model next?
- NOW READ What is GPT-4? What can it do?