Stable Video Diffusion – Stability AI releases AI video generator

The new #1 AI video model?

Stable Video Diffusion, the text-to-video generative AI model from Stability AI.

PC Guide is reader-supported. When you buy through links on our site, we may earn an affiliate commission. Prices subject to change. Read More

Last Updated on

This year has seen AI image generators go from abstract to photorealistic. In fact, AI-generated content has gone from barely usable to one of the most common forms of content on social media. Individuals and big brands alike have quickly adopted the technology across entertainment and advertising. The same can’t be said of generative video – yet. With their latest generative AI video model, stability AI aims to change this. Having built a stable (pun intended) foundation in AI art with Stable Diffusion, the diffusion models research firm Stability AI now sets its sights on text-to-video and image-to-video models with Stable Video Diffusion.

How to use Stable Video Diffusion – Text-to-video models SVD & SVD-XT

On November 21st, Stability AI announced Stable Video Diffusion, its “first foundation model for generative video based on the image model Stable Diffusion.”

Already showing results that compete with rival AI video generators Runway and Pika Labs, “this state-of-the-art generative AI video model represents a significant step” for generative artificial intelligence. The AI model research firm proudly states that its diverse open-source portfolio, spanning “across modalities including image, language, audio, 3D, and code… is a testament to Stability AI’s dedication to amplifying human intelligence.”

Stability AI release SVD and SVD-XT, the Stable Video Diffusion AI models.
Stable Video Diffusion generates video from a text prompt.

Leading closed models such as text-to-video platforms Runway and Pika Labs have offered different modalities to the Stable Diffusion model for several months, but now the two new AI models from Stability AI are “capable of generating 14 and 25 frames” per rendered file, “at customizable frame rates between 3 and 30 frames per second. At the time of release in their foundational form, through external evaluation, we have found these models surpass the leading closed models in user preference studies.”

The main difference between the SVD and SVD-XT models is their respective lengths and framerates. SVD-XT will be capable of longer video generations but will be more computationally expensive as a result.

Essential AI Tools

Editor’s pick

7-in-1 AI Content Checker – One-click, Seven Checks

7 Market leading AI Content Checkers in ONE click. The only 7-in-1 AI content detector platform in the world. We integrate with leading AI content detectors to give unparalleled confidence that your content appear to be written by a human.
Only $0.00015 per word!

Winston AI detector

Winston AI: The most trusted AI detector. Winston AI is the industry leading AI content detection tool to help check AI content generated with ChatGPT, GPT-4, Bard, Bing Chat, Claude, and many more LLMs.
Only $0.01 per 100 words

Originality AI detector

Originality.AI Is The Most Accurate AI Detection.Across a testing data set of 1200 data samples it achieved an accuracy of 96% while its closest competitor achieved only 35%. Useful Chrome extension. Detects across emails, Google Docs, and websites.
EXCLUSIVE DEAL 10,000 free bonus credits

Jasper AI

On-brand AI content wherever you create. 100,000+ customers creating real content with Jasper. One AI tool, all the best models.


10x Your Content Output With AI. Key features – No duplicate content, full control, in built AI content checker. Free trial available.
✓ Steve says

Video – The final frontier

Each mode of digital media (text, audio, image, and video) comes with a unique set of challenges to achieve the level of fidelity required for real-world commercial applications. That said, video has predictably become the final frontier of the four, with the greatest number of challenges and, as a result, will be the last form to be perfected.

Researchers developing this model explored three different model training techniques for video LDM (Logical Data Model) architecture: “text-to-image pretraining, video pretraining, and high-quality video fine-tuning.”

Further technical details can be found in the official research paper.

Where to use Stable Video Diffusion

It is currently in research preview, meaning you can’t use it just yet. You can however sign up for the waitlist for the “new upcoming web experience”.

Will the Stability AI video generator be open source?

Yes! The new AI video generator will be open-source. In fact, the code is already available to copy from Github, and those who wish to run the text-to-video interface locally can find the model weights on Hugging Face.