Description: Deformable Neural Radiance Fields creates free-viewpoint portraits (nerfies) from casually captured videos.
nerf (195) d-nerf (90) nerfies (89)
Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability. However, such models based on image-to-image translation have no guarantee of view consistency, limiting the performance for downstream tasks like 3D reconstruction and image-to-3D generation. To empower consistency, we propose Consistent123 to synthesize novel views simultaneously by incorporating additional cross-view attention layers and the shared self-attention mechanism. The proposed attentio
(a) At the training stage, multiple noisy views concatenated (denoted as ⊕) with the input view are fed into the denoising U-Net simultaneously, conditioned on the CLIP embedding of the input view and the corresponding poses. For sampling, views are denoised iteratively from the normal distribution through the U-Net. (b) In the shared self-attention layer, all views query the same key and value from the input view, which provides detailed spatial layout information for novel view synthesis. The input view a