Last Updated on
On September 20th 2023, OpenAI CEO Sam Altman announced DALL·E 3, the latest version of the AI art generator. The last version brings significant improvements to text-in-image generation, and ChatGPT integration. This comes in addition to higher aesthetic polish, nuance of understanding, and attention to user input. It represents a serious rival to competitors Midjourney, the reigning text-to-image AI system, as well as Stable Diffusion from Stability AI. What is DALL-E 3, and why is it significant?
What is DALL-E 3?
DALL-E 3 (Stylized DALL·E) is the third instalment in the DALL·E line of AI models for “creating images from text” prompts.
We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language.OpenAI.com
It is a diffusion-based generative AI model, used for image generation. In other words, DALL·E is a text-to-image AI model that can generate a digital image from a description written in natural language.
DALL·E 3 is the latest release in the popular line-up of AI image generators. This time, however, OpenAI researchers have merged the power of GPT-4 with DALL·E. The ChatGPT integration allows you to use DALL·E 3 for image generation inside of the ChatGPT interface!
*ChatGPT here includes all instances of the AI Chatbot, but exclusively for ChatGPT Plus and ChatGPT Enterprise customers.
The colloquial portmanteau “ChatGPT 4” refers to the pairing of ChatGPT with GPT-4. DALL·E is comparatively a portmanteau of Disney Pixar’s original character WALL-E, from the film by the same name, and Spanish surrealist painter Salvador Dali. It was named this way to reference both fururist ideals and artistic endeavours; A vision of our distant techno-future, as well as one of the most creative figures in human-generated art.
Essential AI Tools
Winston AI detector
Originality AI detector
How does OpenAI’s AI art generator work?
DALL·E 1 was a GAN (Generative Adversarial Network), whereas DALL·E 2, and now DALL·E 3 have both been diffusion models. All three iterations use transformer architecture – a ubiquitous artificial intelligence technology which is used across across OpenAI’s GPT (Generative Pre-trained Transformer) model lineup and beyond. OpenAI’s Whisper, MuseNet, and Jukebox AI (all deep neural networks for AI-generated audio) each make use of transformer architecture.
DALL·E was trained by learning the relationship between images and the text used to describe them.OpenAI.com
What is a Diffusion Model?
A diffusion model is a type of deep learning model which has been trained how to use noise as a generation method. It can build up a picture from scratch by adding coloured noise to a ‘blank canvas’. It learns how to do this by taking taking many images (the training data), each labelled with a text description, and adding noise to them until the image is 100% Gaussian noise. Now it knows exactly how much noise (and the distribution thereof) it takes to turn a desirable image into pure noise. It then infers the reverse to be true. Given enough examples, and practice, it can reverse engineer the process and generate any image, including ones that did not exist in the training data. Higher-quality images in the training data leads to higher-quality image output!
These text-image pairs are paired with a large language model (LLM) trained on a dataset of text, to understand your text prompts. When you write your prompt, these AI systems work together; The LLM uses natural language processing (NLP) to codify an instruction that the diffusion model can understand.
It uses a process called diffusion, which starts with a pattern of random dots and gradually alters that pattern towards a final output.OpenAI.com
How is DALL·E different to ChatGPT?
DALL·E is different to ChatGPT because the former generates images, and the latter generates text. Both are generative AI, but ChatGPT is an AI chatbot, while DALL·E is not a chatbot because it only generates images, not conversational responses. ChatGPT also has paid subscription plans called ChatGPT Plus and ChatGPT Enterprise, whereas DALL·E has historically operated on a PAYG credits system (pay as you go). For the time being, however, DALL·E 3 will only be accessible with a paid subscription to ChatGPT.
DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall.OpenAI.com
Will DALL·E 3 be free?
DALL·E 3 will not be free. Instead, it will operate on a pay-as-you-go credits system, just like DALL·E 2. Initially, use of DALL·E 3 is be limited to paying ChatGPT Plus and ChatGPT Enterprise subscribers, and is not available with the free version of ChatGPT. This will likely make ChatGPT Plus worth it for you, especially if you use both ChatGPT and Midjourney on a regular basis.
- A credit can be used for one DALL·E request: generating images through a text prompt, an edit request, or a variation request.
- You get 15 free credits each month. Free credits don’t roll over, so they’ll expire a month after they were granted.
- You can purchase additional credits through your account page.
Can you use images generated with DALL·E 3?
Yes, AI images generated with DALL·E 3 are free for personal and commercial use. DALL·E 2 is also copyright free, and free for both personal and commercial use.
As with DALL·E 2, the images you create with DALL·E 3 are yours to use and you don’t need our permission to reprint, sell or merchandise them.OpenAI.com
Will DALL·E 3 be open-source?
No, DALL·E 3 is not open-source. DALL·E 1 and DALL·E 2 have never been open-source either.
Despite their parent companies name, “OpenAI”, not everything it creates is necessarily open, including ChatGPT.
Is DALL·E 3 the best AI art generator?
The position of best AI art generator is a hotly contested one. As the conclusion is based on subjective and aesthetic qualities of the end result, the answer is also somewhat subjective. That said, it widely agreed that Midjourney is the best AI art generator.
This conclusion is markedly surprising one, because Midjourney is a small startup in comparison to Google, Microsoft, Meta, and even OpenAI – which now has a multi-billion USD valuation. Most of the tech giants have their own text-to-image AI art generators, and yet none of them are as good.
So, what is DALL-E 3 doing differently? To start, it produces better results with less refining of the text prompt – known as prompt engineering – than DALL·E 2. Midjourney will have to release new text-in-image features in order to continue being worth it over alternative AI art generators.
Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide.OpenAI.com
The latest iteration can edit images and make higher quality logos than its predecessor, but still cannot make GIFs. Despite this, internal policy researcher Sandhini Agarwal has “high confidence” in its ability to produce accurate images of human details while still abiding by safeguards and safety features.
DALL·E 3 examples
Released by OpenAI, here are the latest and greatest examples of AI images generated by DALL·E 3. The prompts used to generate them are included within the images, to exemplify the usability of the model.