Computer Vision , AI

[One-page summary] TuneA Video: One Shot Tuning of Image Diffusion Models for Text to Video Generation by Wu et al. 본문

Paper_review[short]

[One-page summary] TuneA Video: One Shot Tuning of Image Diffusion Models for Text to Video Generation by Wu et al.

Elune001 2024. 1. 16. 00:27

● Summary: Text to Video generation model using Text to Image diffusion model

 

● Approach highlight

  • Spatio-temporal attention for efficiency: attend to selected previous frame( first, previous frame)

  • T2V generation using T2I model fine-tuning: update only attention block in fine-tuning stage

● Main Results


● Discussion

  • lack of ability to represent multiple object interactions due to limitations of the underlying diffusion model