Computer Vision , AI

[One-page summary] Zero1 to 3: Zero shot One Image to 3D Object by Liu et al. 본문

Paper_review[short]

[One-page summary] Zero1 to 3: Zero shot One Image to 3D Object by Liu et al.

Elune001 2024. 1. 16. 00:24

● Summary:Diffusion model for NeRF

 

● Approach highlight

  • Viewpoint-conditioned translation image translation model using a conditional latent diffusion model $\hat{X}_{R,T}=f(x,R,T)$

 

  • Score Jacobian Chaining (SJC) for 3d representation:
    1.  randomly sample viewpoints
    2.  perform volumetric rendering
    3.  perturb the resulting images with Gaussian noise ϵ
    4.  denoise them by applying the Unet $ϵ_{θ}$ conditioned on the input image, posed CLIP embedding and timestep

● Main Results

● Discussion

  • In fig6 the model doesn't seem to work well with multiple objects. I think that the reason the viewpoint synthesis diffusion model is trained on a single object. (Domain shift problem of diffusion model)