| 일 | 월 | 화 | 수 | 목 | 금 | 토 |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| 23 | 24 | 25 | 26 | 27 | 28 | 29 |
| 30 |
- Continual Learning
- ENERGY-BASED MODELS FOR CONTINUAL LEARNING
- Discrete diffusion
- state_dict()
- DualPrompt
- PnP algorithm
- Markov transition matrix
- Mask-and-replace diffusion strategy
- requires_grad
- img2pose: Face Alignment and Detection via 6DoF
- Facial Landmark Localization
- CIL
- Class Incremental Learning
- VQ-diffusion
- learning to prompt
- Vector Quantized Diffusion Model for Text-to-Image Synthesis
- prompt learning
- timm
- Face Pose Estimation
- VQ-VAE
- CVPR2022
- learning to prompt for continual learning
- Img2pose
- Class Incremental
- L2P
- Face Alignment
- Energy-based model
- mmcv
- 베이지안 정리
- Mask diffusion
- Today
- Total
목록전체 글 (39)
Computer Vision , AI
● Summary: The performance of a transformer comes from its architecture, not the attention module ● Approach highlight MetaFormer: The structure of the transformer plays a bigger role in performance than the type of token mixer PoolFormer: Prove that the structure of the MetaFormer has a greater impact on the performance of the transformer by replacing the token mixer with a pooling layer to val..
● Summary: Solving the problem of lack of sign language translation label data with progressive pretraining ● Approach highlight Improve sign language translation performance with the Pretrain Language model S3D backbone base visual encoder V-L mapper for end-to-end training: simple fully connected 2 MLP layer ● Main Results ● Discussion Can a Simple 2 MLP layer(V-L Mapper) efficiently represent..
● Summary: ask if AI navigation agents build implicit (or ‘mental’) maps like animals ● Approach highlight Blind vs Clairvoyant Bug algorithm: effective navigation with only egomotion sensing Memory enables blind agent’s effective performance ● Main Results ● Discussion Can it work in more complex environments? (is it possible to understand moving objects?)
● Summary: Visual Acoustic Tactile multisensory robot learning ● Approach highlight Modality-Temporal feature fusion with self-attention: 1. cross-modality attention 2. cross-time attention 3. cross-modality and cross-time attention ● Main Results ● Discussion Too restrictive an experimental environment (only works in easy and limited settings)
● Summary: Plan from human play, Control from teleoperated demonstrations ● Approach highlight high-level planer and low-level control policy: learn high-level plans from human play data with their hands and learn low-level control using a small number of teleoperated demonstrations ● Main Results ● Discussion Efficiency in sophisticated finger-driven tasks (high-level planner can work?) What is..
● Summary: Zero-shot 4D generation(time + 3D) using text prompt ● Approach highlight HexPlane: represents a 4D scene with six planes of feature vectors spanning all pairs of axes in {X,Y,Z,T}. To train Scene optimization, Project the rendered result, and denoising with text embedding and use this loss to train Scene optimization ● Main Results ● Discussion Limitations of representing complex or ..
● Summary: Text to Video generation model using Text to Image diffusion model ● Approach highlight Spatio-temporal attention for efficiency: attend to selected previous frame( first, previous frame) T2V generation using T2I model fine-tuning: update only attention block in fine-tuning stage ● Main Results ● Discussion lack of ability to represent multiple object interactions due to limitations o..
● Summary:Diffusion model for NeRF ● Approach highlight Viewpoint-conditioned translation image translation model using a conditional latent diffusion model $\hat{X}_{R,T}=f(x,R,T)$ Score Jacobian Chaining (SJC) for 3d representation: randomly sample viewpoints perform volumetric rendering perturb the resulting images with Gaussian noise ϵ denoise them by applying the Unet $ϵ_{θ}$ conditioned on..
● Summary: Monocular depth estimation using diffusion model with noisy and incomplete depth map in training data ● Approach highlight Fill missing depth: for diffusion process, fill indoor missing depth(window, mirror) by nearest interpolating and fill outdoor missing depth(sky) with a maximum depth Step-Unrolled Denoising Diffusion ● Main Results ● Discussion To fill the outdoor missing depth m..
● Summary: Use diffusion model to generate multiple view images for NeRF ● Approach highlight Training phase: Train a diffusion model to generate images corresponding to multiple views of an object in the training phase. Finetuning phase: The diffusion model learned in the training phase is used to train the NeRF by creating multiple views of the image ● Main Results ● Discussion In my opinion, ..