t2v-turbo.github.io - T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Description: We proposed T2V-Turbo, which achieves both fast and high-quality video generation by breaking the quality bottleneck of Video Consistency Model.

text-to-video (29) consistency model (3) learning from human/ai feedback (2)

Example domain paragraphs

Diffusion-based text-to-video (T2V) models have achieved significant success but continue to be hampered by the slow sampling speed of their iterative sampling processes. To address the challenge, consistency models have been proposed to facilitate fast inference, albeit at the cost of sample quality. In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achieve both fast and high-quality video generation . We introduce T2V-Turbo , which integrates feedback from a mixtur

Overview of the training pipeline of our T2V-Turbo . We integrate reward feedback from both an image-text RM and a video-text RM into the VCD procedures by backpropagating gradient through the single-step generation process of our T2V-Turbo

We compare our T2V-Turbo (VC2) and T2V-Turbo (MS) with baseline methods across the 16 VBench dimensions. A higher score indicates better performance for a particular dimension. We bold the best results for each dimension and underline the second-best result. Quality Score is calculated with the 7 dimensions from the top table. Semantic Score is calculated with the 9 dimensions from the bottom table. Total Score a weighted sum of Quality Score and Semantic Score. Both our T2V-Turbo (VC2) and T2V-Turbo (MS) s

Links to t2v-turbo.github.io (2)