[One-page summary] Make A Video: Text to Video Generation Without Text Video Data (ICLR 2023) by Singer et al.

Notice

Recent Posts

Tags more

Archives

관리 메뉴

Computer Vision , AI

Paper_review[short]

Elune001 2024. 1. 16. 01:02

● Summary: Text to Video generation with Text Image Data

● Approach highlight

Text-to-Image Model: DALLE 2 architecture
Spatiotemporal layers: U-Net based spatiotemporal diffusion decoder makes a frame from noise
Frame interpolation network

● Main Results:

● Discussion

How to generate temporal frames from the spatiotemporal decoder
How to learn the relationship between text and action that can only be inferred in videos (ex. a video of a person waving their hand left to right or right to left)

[One-page summary] Zero-shot Image to Image Translation ( arxiv 2023) by Parmar et al. (0)	2024.01.16
[One-page summary] From Motor Control to Team Play in Simulated Humanoid Football (Science Robotic 2022) by Liu et al. (0)	2024.01.16
[One-page summary] MetaFormer Is Actually What You Need for Vision (CVPR 2022) by Yu et al. (0)	2024.01.16
[One-page summary] A Simple MultiModality Transfer Learning Baseline for Sign Language Translation (CVPR 2022) y Chen et al. (0)	2024.01.16
[One-page summary] Emergence of Maps in the Memories of Blind NavigationAgents(ICLR 2023) by Wijmans et al. (0)	2024.01.16

'Paper_review[short]' Related Articles