DALL-E is one of the most popular AI image generators out there. Created by OpenAI, the artificial intelligence firm behind ChatGPT, this AI art generator has been one to watch since DALL-E 1 was announced in January 2021. Since then, the release of DALL-E 2 and now DALL-E 3, the most recent iteration, have impressed the world, demonstrating what a computer can infer from a simple text prompt.
In our hands-on DALL-E 3 review, we tested the quality and range of OpenAI’s most powerful publicly accessible AI image generation model. Is it really as good as people claim? We’ll help you decide for yourself, showing examples along the way. Plus, our opinion on usability, integrations, and the overall feature set. Is this AI-powered tool still worth it in 2024? Let’s find out.
DALL-E 3 review – is it worth it?
- Integrated with text generation, plugins, internet access, and custom GPTs when used via ChatGPT
- Adheres well to natural language prompts
- Allows a limited quantity of free image generations via Copilot Designer
- Excellent text-in-image generation
- Can only generate one image at a time
- Cannot use ChatGPT for anything else while an image is generating
- No free image generation via ChatGPT (Plus required)
- Very limited aspect ratio selection (no 16:9)
- No photorealism
- Imperfect human hands and faces
How does DALL-E 3 work?
This AI-powered tool is software that can produce images to match a given description, known as a text prompt. In other words, the user inputs text, and the software outputs an image. This image is unique, in that this exact arrangement of pixels does not necessarily exist anywhere else. It is, however, ‘inspired’ by examples. Human software engineers can put together a large dataset of many images, known as the training data, which the AI system can learn from. A training methodology called positive and negative reinforcement learning can be used, wherein the AI system is given a text prompt, produces an output, and is then told whether the result is desirable or undesirable. Given enough attempts, the artificial intelligence system will form a kind of digitized understanding of what words correspond to what arrangement of pixels.
Top 5 Cyber Monday deals
This year's Cyber Monday has officially kickstarted, offering up huge discounts on some of the market's leading tech products. Below, we'll list the best Cyber Monday deals we can find.
- Samsung Odyssey G9 OLED Curved gaming monitor - WAS $1,599.99 NOW $949.99 Save - $350
- LG C3 Series 65-Inch Class OLED - WAS $1,499.99 NOW $1,196.99 Save - $70
- AMD Ryzen 5 7600X 6-Core - WAS $299.99 NOW $186.97 Save - $330
- Beats Stuydio Buds - WAS $178 NOW $99 Save - $300
- Bose QuietComfort Bluetooth 5.1 Headphones - WAS $349 NOW $199 Save - $150
Prices and savings subject to change. Click through to get the current deal prices.
On the front end, all you really need to know is that you can head to a website in your (desktop or mobile) browser, or open an app on your phone, and create images just by describing them. These images aren’t pulled directly from a database and no two users will get the same image from the same prompt (unless they’re able to specify the same seed). Instead, these images are AI-generated.
Using DALL-E 3 – my experience
I’ve been using DALL-E AI since before DALL-E 3 was integrated into ChatGPT on October 19th, 2023. Since then, the comprehensive experience of using one of the world’s most popular AI text generators, and AI image generators, from the same interface has proved an attractive proposition for the 100 million weekly active users reported by OpenAI CEO Sam Altman.
First impressions
Aspect | First impression |
---|---|
Usability | Accessible via web browser or mobile app (anywhere ChatGPT itself is accessible), DALL-E is easy to use and requires nothing more than a text prompt. It’s very straightforward, and does not require any technical expertise. You’ll need to be able to communicate effectively though, at no fault of the software. |
Quality | Quality is disappointing in most cases, and it appears completely unable to produce photorealism. Potentially useful to visualise ideas that you can’t draw yourself. Visually identifiable as AI-generated, but good at producing text within images. |
Features | DALL-E 3 itself has little in the way of features, and you cannot strictly specify parameters. Instead, the AI system will infer everything from the same text string written in natural language, including objective parameters like aspect ratio. When successful, this is impressive, but it’s less likely to be successful than software that allows you to set these settings manually. |
Integrations | Integrated with ChatGPT, allowing it to use the natural language processing (NLP) capabilities of GPT-4, which is very good. Internet access and the use of plugins in conjunction with image generation via the same interface is perhaps the most convincing selling point of this software. |
Writing text in images
DALL-E can write text in images, to some degree of success. Shown below are two separate images generated from the prompt “Write the text “Generated by DALL-E” as a neon sign in a hyperrealistic cyberpunk street”.
Below are two images, generated separately by DALL-E and then combined side-by-side for demonstration purposes. Consider these images divided into left and right by the visible separation running vertically down the middle. The first attempt was mostly accurate, arguably missing one L from the word DALL-E. While it could be argued that the missing L is incorporated into the A, this isn’t legible and won’t be considered perfect for the purposes of our test.
The second example is perfect in terms of spelling, but we wish it had demonstrated an ability to accurately capitalize, instead of hiding behind all-caps. However, in later tests, we found it capable of using lowercase and uppercase letters correctly in the same sentence. In terms of typography, the kerning and leading of the text leaves a lot to be desired. Here, DALL-E doesn’t present text as neatly as text editing software such as Microsoft Word.
Stylistically, these images are disappointing. In the second example, I pushed DALL-E for more cinematic lighting and a more dramatic angle, but the composition has hardly changed. This is not the standard of quality I would accept in a professional creative project, and I found subpar aesthetic quality in most subsequent tests. Ultimately, it’s clear that these are AI-generated images.
Adherence to the prompt
In addition to this, it can successfully interpret the requirement of a specific aspect ratio as part of the text prompt. Whereas some other AI image generators allow the user to specify parameters such as this externally from the text prompt (such as in the settings), DALL-E displays an impressive ability to infer which elements of the same prompt are stylistic, and which alter the parameters of the generation itself.
In an attempt to confuse the NLP (natural language processing) aspect of the software, I gave it the following prompt:
Create an image 256 pixels vertically, and 21:9 aspect ratio. Make it red, with white text that says “Blue”.
The red ‘background’ was not added manually, and is part of the generated image. ChatGPT had informed me conversationally that the minimum aspect ratio of an image it can produce is 256×256 pixels. The AI system also responded that it was capable of 21:9 aspect ratio, among others. However, the image above is not 21:9. In fact, it’s not even 16:9. The most common resolution for a 16:9 image is 1920×1080, but instead DALL-E has produced an image of 1792 x 1024 pixels. This results in an aspect ratio of 7:4, a non-standard size that’s frankly useless, isn’t a recognized standard anywhere, and would need to be resized manually. I wouldn’t call this a successful test.
The official DALL-E 3 research paper makes no mention of the 21:9 aspect ratio. As a result, the information that ChatGPT provided is most likely inaccurate, as AI-generated answers can often be.
DALL-E for corporate web design
We tested DALL-E 3 with the following prompt:
Generate an image of corporate web page, sneaker website UI
Here, we were hoping for DALL-E to infer that we wanted a user interface. We were looking for an AI-generated screenshot, a 2D visualization of a web page, and in two out of four attempts, this is what we got. Fifty percent isn’t an excellent success rate, but in all fairness, the other two attempts were still valid and useful interpretations. The other two images depicted 3D visualizations of devices that feature a UI on-screen, and this could be exactly what another user is hoping for. It’s a good thing that DALL-E produced both interpretations, allowing us to be more specific in future prompts based on what we truly meant.
Unfortunately, we can’t show what DALL-E 3 produced because every single one features the distinctive logo of one of the world’s most popular sneaker brands. This was secretly the final element of the corporate web design test. Does the AI system produce recognizable logos or trademarked content in images? Without answering that question directly, these images were only fit for use as inspiration. In all fairness, we weren’t expecting something that could be deployed on a live website, as DALL-E is intended for image generation and not interactable user interfaces.
DALL-E for 3D video game assets
With this test, we need to be absolutely clear that DALL-E 3 doesn’t produce three-dimensional digital objects that can be rotated, known as 3D models. It can, however, produce 2D images that depict a 3D perspective.
Here, DALL-E 3 was prompted to do just that, with the following prompt:
Generate an image of 3D video game assets.
Above are two images, generated separately by DALL-E and then combined side-by-side in a different software. DALL-E 3 has successfully inferred an art style appropriate for video games and has chosen an effective way to depict them known as an isometric flat lay. While these words were not included in the prompt, a quick internet search will show that this is a common practice in the relevant industry. The expectation here is not that we could use any part of our generated output in a 3D videogame, because DALL-E does not claim to be a 3D model generator. Within the bounds of AI image generation technology, we can consider this a successful test.
That said, there are imperfections, such as sword hilts with no blade, and coins the size of books (if the image on the left is to be taken as to-scale).
DALL-E for polaroid photography
We have multiple success criteria in this test. Not only are we looking for adherence to the prompt – the inclusion of two people in a specific pose and in specific clothing, as well as the aesthetic of a Polaroid photograph – but also the accuracy of human faces and hands. Hands and faces have proved to be one of the greatest challenges for AI image generators, along with accurate text-in-image generation which we tested earlier. Below are two images, generated separately by DALL-E but manually combined side by side for demonstration purposes.
Impressively, the close-up of the high-five (right) is a successful depiction of a human hand. However, notice how the other hand in the image on the right is unsuccessful. The man on the far right has a mirrored hand, with his thumb on the right side of his right hand. As a result, we can conclude that DALL-E 3 can produce a convincing photorealistic human hand, but will not do so consistently.
Unfortunately, we found DALL-E 3 unable to produce a photorealistic image of a human face any amount of the time. To be clear, OpenAI’s image generator clearly understands ‘what a human face looks like’, in that all the elements are there. In terms of geometry, proportion, and variety, we’d conclude that DALL-E can produce images of a human face. However, these are not photorealistic and would not be confused with a real photograph.
In addition, we found it interesting that the AI system chose to depict these polaroid photographs from a POV (point-of-view) perspective. These polaroids are shown within the context of being held, or being placed on a table surrounded by objects that match the aesthetic of those found within the polaroid. The alternative would be that the generated image exclusively depicts what is seen by the (fictional) camera lens, with no border. We found this to be a consistent quirk that could not be fixed by specifying “no border”.
Final thoughts – is DALL-E 3 worth it?
In summary, we found DALL-E 3 lacking in several ways. Photorealism is off the table, human faces and hands are not realistic unless close-up, and even then it’s inconsistent.
However, text-in-image generation is viable, and impressively clean when produced in a large size. OpenAI’s AI image generator appears able to adhere to complex prompts that were intended to be confusing for natural language processing. As a result of being integrated into ChatGPT 4, and using GPT-4 for NLP, it is able to understand the language (and unspoken meaning) of your requests arguably better than any other system of this kind. You can also ask ChatGPT for the seed used in a generated image, making it potentially possible to generate the same image again or make incremental tweaks to it.
There are many pros and cons to DALL-E 3. Ultimately, we wouldn’t call it the best AI image generator on the market. Those looking for the best possible AI image quality may want to look elsewhere, especially for photorealism. However, those looking for an integrated solution for both text and image generation may be satisfied with ChatGPT 4. You’ll need a paid subscription to ChatGPT Plus (or other premium ChatGPT subscription), but access to AI-generated images, text, and the capabilities of plugins all within the same prompt window is an unbeatable offer for some.
In case you are looking for AI platforms that can help create visuals, give these guides a read: