'Paper_review[short]' 카테고리의 글 목록

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Paint by Example: Exemplar-based Image Editing with Diffusion Models (CVPR 2023) by Yang et al.

● Summary: Simple method for Image editing with a diffusion model only using CLIP [CLS] token embedding ● Approach highlight Image editing without labels using only the detection model Crop the original image and augment the image for CLIP embedding Only use [CLS] token to prevent the model from just doing copy-and-paste Classifier free sampling for image identity (scale factor) ● Main Results ●..

Paper_review[short] 2024. 1. 16. 01:37

[One-page summary] ObjectStitch: Generative Object Compositing (CVPR 2023) by Song et al.

● Summary: a single framework for Image composition (color harmonization, geometric correction, shadow generation) with no label ● Approach highlight Self-supervised learning: segment object from the original image and mask that portion Content adaptor for object identity: image-to-text embedding using CLIP embedding (to use a diffusion model designed for text embedding) Diffusion with the maske..

Paper_review[short] 2024. 1. 16. 01:31

[One-page summary] Zero-shot Image to Image Translation ( arxiv 2023) by Parmar et al.

● Summary: Zero-shot i mage translation using cross-attention map guidance ● Approach highlight Noise regularization for image inversion: to ensure Gaussian noise Cross-attention map guidance: Allows you to edit only the parts you want while maintaining the overall context of the original image ● Main Results ● Discussion Is it really a zero-shot setup? (using CLIPBLIP)

Paper_review[short] 2024. 1. 16. 01:27

[One-page summary] From Motor Control to Team Play in Simulated Humanoid Football (Science Robotic 2022) by Liu et al.

● Summary: hierarchically structured behavior and long-horizon coordination for RL ● Approach highlight Hierarchically structured behavior Imitation for low level ex)run Reinforcement learning for Drill ex) kick, dribble Distillation for single player Multi player reinforcement learning ● Main Results: ● Discussion Limitation of simple reward only goal score) Too heavy model

Paper_review[short] 2024. 1. 16. 01:24

[One-page summary] Make A Video: Text to Video Generation Without Text Video Data (ICLR 2023) by Singer et al.

● Summary: Text to Video generation with Text Image Data ● Approach highlight Text-to-Image Model: DALLE 2 architecture Spatiotemporal layers: U-Net based spatiotemporal diffusion decoder makes a frame from noise Frame interpolation network ● Main Results: ● Discussion How to generate temporal frames from the spatiotemporal decoder How to learn the relationship between text and action that can o..

Paper_review[short] 2024. 1. 16. 01:02

[One-page summary] MetaFormer Is Actually What You Need for Vision (CVPR 2022) by Yu et al.

● Summary: The performance of a transformer comes from its architecture, not the attention module ● Approach highlight MetaFormer: The structure of the transformer plays a bigger role in performance than the type of token mixer PoolFormer: Prove that the structure of the MetaFormer has a greater impact on the performance of the transformer by replacing the token mixer with a pooling layer to val..

Paper_review[short] 2024. 1. 16. 00:58

[One-page summary] A Simple MultiModality Transfer Learning Baseline for Sign Language Translation (CVPR 2022) y Chen et al.

● Summary: Solving the problem of lack of sign language translation label data with progressive pretraining ● Approach highlight Improve sign language translation performance with the Pretrain Language model S3D backbone base visual encoder V-L mapper for end-to-end training: simple fully connected 2 MLP layer ● Main Results ● Discussion Can a Simple 2 MLP layer(V-L Mapper) efficiently represent..

Paper_review[short] 2024. 1. 16. 00:52

[One-page summary] Emergence of Maps in the Memories of Blind NavigationAgents(ICLR 2023) by Wijmans et al.

● Summary: ask if AI navigation agents build implicit (or ‘mental’) maps like animals ● Approach highlight Blind vs Clairvoyant Bug algorithm: effective navigation with only egomotion sensing Memory enables blind agent’s effective performance ● Main Results ● Discussion Can it work in more complex environments? (is it possible to understand moving objects?)

Paper_review[short] 2024. 1. 16. 00:48

[One-page summary] See, Hear, and Feel: Smart Sensory Fusion for RoboticManipulation( CoRL 2022) by Li et al.

● Summary: Visual Acoustic Tactile multisensory robot learning ● Approach highlight Modality-Temporal feature fusion with self-attention: 1. cross-modality attention 2. cross-time attention 3. cross-modality and cross-time attention ● Main Results ● Discussion Too restrictive an experimental environment (only works in easy and limited settings)

Paper_review[short] 2024. 1. 16. 00:44

[One-page summary] MimicPlay: Long Horizon Imitation Learning by Watching Human Play by Wang et al.

● Summary: Plan from human play, Control from teleoperated demonstrations ● Approach highlight high-level planer and low-level control policy: learn high-level plans from human play data with their hands and learn low-level control using a small number of teleoperated demonstrations ● Main Results ● Discussion Efficiency in sophisticated finger-driven tasks (high-level planner can work?) What is..

Paper_review[short] 2024. 1. 16. 00:41

Computer Vision , AI

목록Paper_review[short] (26)

Computer Vision , AI

티스토리툴바