Presentation + Paper
7 June 2024 Efficient and consistent zero-shot video generation with diffusion models
Ethan Frakes, Umar Khalid, Chen Chen
Author Affiliations +
Abstract
Recent diffusion-based generative models employ methods such as one-shot fine-tuning an image diffusion model for video generation. However, this leads to long video generation times and suboptimal efficiency. To resolve this long generation time, zero-shot text-to-video models eliminate the fine-tuning method entirely and can generate novel videos from a text prompt alone. While the zero-shot generation method greatly reduces generation time, many models rely on inefficient cross-frame attention processors, hindering the diffusion model’s utilization for real-time video generation. We address this issue by introducing more efficient attention processors to a video diffusion model. Specifically, we use attention processors (i.e. xFormers, FlashAttention, and HyperAttention) that are highly optimized for efficiency and hardware parallelization. We then apply these processors to a video generator and test with both older diffusion models such as Stable Diffusion 1.5 and newer, high-quality models such as Stable Diffusion XL. Our results show that using efficient attention processors alone can reduce generation time by around 25%, while not resulting in any change in video quality. Combined with the use of higher quality models, this use of efficient attention processors in zero-shot generation presents a substantial efficiency and quality increase, greatly expanding the video diffusion model’s application to real-time video generation.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Ethan Frakes, Umar Khalid, and Chen Chen "Efficient and consistent zero-shot video generation with diffusion models", Proc. SPIE 13034, Real-Time Image Processing and Deep Learning 2024, 1303407 (7 June 2024); https://doi.org/10.1117/12.3013575
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Video

Diffusion

Video processing

Video acceleration

Depth maps

Motion models

Visual process modeling

Back to Top