일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 |
- Face Alignment
- VQ-VAE
- img2pose: Face Alignment and Detection via 6DoF
- state_dict()
- Facial Landmark Localization
- Vector Quantized Diffusion Model for Text-to-Image Synthesis
- Mask diffusion
- mmcv
- Continual Learning
- learning to prompt for continual learning
- CIL
- VQ-diffusion
- Mask-and-replace diffusion strategy
- ENERGY-BASED MODELS FOR CONTINUAL LEARNING
- PnP algorithm
- Class Incremental Learning
- L2P
- Class Incremental
- learning to prompt
- timm
- CVPR2022
- 베이지안 정리
- Discrete diffusion
- Img2pose
- prompt learning
- Energy-based model
- Face Pose Estimation
- DualPrompt
- requires_grad
- Markov transition matrix
- Today
- Total
목록Paper_review[short] (26)
Computer Vision , AI

● Summary: Simple method for Image editing with a diffusion model only using CLIP [CLS] token embedding ● Approach highlight Image editing without labels using only the detection model Crop the original image and augment the image for CLIP embedding Only use [CLS] token to prevent the model from just doing copy-and-paste Classifier free sampling for image identity (scale factor) ● Main Results ●..

● Summary: a single framework for Image composition (color harmonization, geometric correction, shadow generation) with no label ● Approach highlight Self-supervised learning: segment object from the original image and mask that portion Content adaptor for object identity: image-to-text embedding using CLIP embedding (to use a diffusion model designed for text embedding) Diffusion with the maske..

● Summary: Zero-shot i mage translation using cross-attention map guidance ● Approach highlight Noise regularization for image inversion: to ensure Gaussian noise Cross-attention map guidance: Allows you to edit only the parts you want while maintaining the overall context of the original image ● Main Results ● Discussion Is it really a zero-shot setup? (using CLIPBLIP)

● Summary: hierarchically structured behavior and long-horizon coordination for RL ● Approach highlight Hierarchically structured behavior Imitation for low level ex)run Reinforcement learning for Drill ex) kick, dribble Distillation for single player Multi player reinforcement learning ● Main Results: ● Discussion Limitation of simple reward only goal score) Too heavy model

● Summary: Text to Video generation with Text Image Data ● Approach highlight Text-to-Image Model: DALLE 2 architecture Spatiotemporal layers: U-Net based spatiotemporal diffusion decoder makes a frame from noise Frame interpolation network ● Main Results: ● Discussion How to generate temporal frames from the spatiotemporal decoder How to learn the relationship between text and action that can o..

● Summary: The performance of a transformer comes from its architecture, not the attention module ● Approach highlight MetaFormer: The structure of the transformer plays a bigger role in performance than the type of token mixer PoolFormer: Prove that the structure of the MetaFormer has a greater impact on the performance of the transformer by replacing the token mixer with a pooling layer to val..

● Summary: Solving the problem of lack of sign language translation label data with progressive pretraining ● Approach highlight Improve sign language translation performance with the Pretrain Language model S3D backbone base visual encoder V-L mapper for end-to-end training: simple fully connected 2 MLP layer ● Main Results ● Discussion Can a Simple 2 MLP layer(V-L Mapper) efficiently represent..

● Summary: ask if AI navigation agents build implicit (or ‘mental’) maps like animals ● Approach highlight Blind vs Clairvoyant Bug algorithm: effective navigation with only egomotion sensing Memory enables blind agent’s effective performance ● Main Results ● Discussion Can it work in more complex environments? (is it possible to understand moving objects?)

● Summary: Visual Acoustic Tactile multisensory robot learning ● Approach highlight Modality-Temporal feature fusion with self-attention: 1. cross-modality attention 2. cross-time attention 3. cross-modality and cross-time attention ● Main Results ● Discussion Too restrictive an experimental environment (only works in easy and limited settings)

● Summary: Plan from human play, Control from teleoperated demonstrations ● Approach highlight high-level planer and low-level control policy: learn high-level plans from human play data with their hands and learn low-level control using a small number of teleoperated demonstrations ● Main Results ● Discussion Efficiency in sophisticated finger-driven tasks (high-level planner can work?) What is..