Home > Apps

How does Stable Diffusion work?

Q: Does Stable Diffusion use images?

In short, yes. You need a large dataset of images and text descriptions to train Stable Diffusion.

The nitty-gritty details about Stable Diffusion

Kevin Pocock

Last Updated on August 14, 2023

PC Guide is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Read More

How does Stable Diffusion create images?
Energy-based model
Does Stable Diffusion use images?
Advantages of Stable Diffusion

Stable Diffusion is a deep learning model that can generate high-quality images from natural language descriptions known as text prompts. But, how does Stable Diffusion work? Generative AI can be an overwhelming topic to take on, but we hope to keep things simple for you here. Understanding how the tool works can in turn help you become a better artist through using the mechanics with more prescision. This post will guide you through the underlying technology behind Stable Diffusion and how it can create realistic images from text descriptions.

How does Stable Diffusion create images?

Stable Diffusion is a generative model that uses deep learning to create images from text. The model is based on a neural network architecture that can learn to map text descriptions to image features. This means it can create an image matching the input text description.

Prime Day may have closed its doors, but that hasn't stopped great deals from landing on the web's biggest online retailer. Here are all the best last chance savings from this year's Prime event.

*Prices and savings subject to change. Click through to get the current prices.

The Stable Diffusion model uses “diffusion” to generate high-quality images from text. The diffusion process involves iteratively updating a set of image pixels based on a diffusion equation. This helps to smooth out the image and create a more realistic texture.

Obviously, there are a lot of complex processes occurring when Stable Diffusion is generating images. To simplify it to its most basic form, the text prompt you provide the model is converted into numbers that relate to the individual words, called tokens. Each token is then converted to a 768-value vector known as embedding. These embeddings are then processed by the text transformer and are ready to be consumed by the noise predictor.

Stable Diffusion is a latent diffusion model, and this is part of the reason it can generate high-resolution images at such a fast speed. The model compresses the image into latent space rather than operating in the high-dimensional image space. In the context of AI, latent space refers to a mathematical space that maps what a neural network has learned from training images. As the latent space is a lot smaller, the images are generated faster.

This compression (and later decompression/painting) is actually done through an autoencoder. The autoencoder compresses the information into the latent space using its encoder, then reconstructs it again as an image using only the compressed information using the decoder.

Essential AI Tools

More Deals Coming Soon!

Energy-based model

Stable Diffusion is an energy-based model that learns to generate images by minimizing an energy function. The energy function measures how well the developed image matches the input text description. Stable Diffusion can create images that closely match the input text by minimizing the energy function.

Does Stable Diffusion use images?

In short, yes. Stable Diffusion does use images. In fact, you need a large dataset of input images and text descriptions to train Stable Diffusion. The model learns to create images by comparing its output to the ground truth images in the dataset. This helps the model learn how to create realistic images from text descriptions.

Once Stable Diffusion has been trained, you can generate images from text descriptions. To do this, you input a text description into the model, and it creates an image that matches the description. The generated image can be further refined by adjusting various parameters, such as the temperature and threshold values.

Advantages of Stable Diffusion

Stable Diffusion has several advantages over other text-to-image models. One of the main advantages is its ability to generate high-quality images with fine details and textures that match the input text. This is due to the diffusion process that allows the model to create stable and consistent images.

One of the reasons for the popularity of stable diffusion comes from its open-sourced nature, its ease of use, and its ability to run on a consumer-level GPU. This is in a way democratizing image generation and generative AI, allowing anyone interested to try it out. If you’re interested in using this AI model for yourself, you can read on about how to run Stable Diffusion locally, and how to use Stable Diffusion to get you started.

NOW READ DALL-E 2 vs Stable Diffusion

About the Author

Kevin Pocock

Kevin has a broad interest and enthusiasm for consumer electronics, PCs, and all things consumer tech. And more than 17 years experience in tech journalism.