Is GPT-4 multimodal?

From analysing images to word generation, what else can this model do?

PC Guide is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Prices subject to change. Read More

Last Updated on

It’s official! OpenAI’s newest GPT model is here. If you are anticipating what new features this model could have, you may be wondering – Is GPT-4 multimodal?

You’ll be happy to hear that OpenAI has confirmed that GPT-4 will be multimodal. In other words, it will be able to cleverly process images and text from its users, generating a response to any question you may have.

This was also confirmed by Microsoft’s Germany CTO last week. When revealing details about GPT-4, Andreas Braun, exposed that the long-term investor of OpenAI will soon “have multimodal models that will offer completely different possibilities”.

So, what can GPT-4 actually do? Well, there are claims online that the new model can generate captions and descriptions tailored to the image you input into the model – the perfect assistant for any social media task.

But, that’s not all. Some users have even found that GPT-4 can recommend recipe ideas based on an image of any ingredients you have left lying around.

GPT-4 will be “more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5”. And you can definitely see evidence of this in the model’s scarily high exam scores.

Shockingly, GPT-4 manages to perform better than 90 percent of humans on the Uniform Bar Exam. That’s pretty impressive, to say the least.

At the moment, the model is only available to Chat GPT Plus members and to those who have passed through OpenAI’s API waiting list.

What is a multimodal model?

So, what even is a multimodal model anyway? If a model is multimodal this refers to its ability to operate using multiple mediums which could include video, images, or audio.

For example Microsoft’s latest model, Kosmos-1 can reportedly perform visual text recognition, find specific content from images and even solve visual puzzles. The fact that this model can take in information in the form of images and can output a response in another, is what makes it multimodal.

OpenAI has already developed its own multimodal model, DALL-E. This revolutionary AI tool can construct images based on text written by humans.

DALL-E is a sophisticated artist and has shown to produce some extremely eye-catching images just using a few prompts.

Final Thoughts

So is GPT-4 multimodal? Yes, it is! OpenAI has been working hard to develop next-level AI technology and it seems that all the hard work paid off. If you are interested in finding out more about GPT-4, we definitely recommend heading over to the OpenAI to see what this model can do.

If you found this article useful, why not read GPT-4 release date: when is the new model next?