Description: DiverseDream: Diverse Text-to-3D Synthesis with Augmented Text Embedding.
Text-to-3D synthesis has recently emerged as a new approach to sampling 3D models by adopting pretrained text-to-image models as guiding visual priors. An intriguing but underexplored problem with existing text-to-3D methods is that 3D models obtained from the sampling-by-optimization procedure tend to have mode collapses, and hence poor diversity in their results. In this paper, we provide an analysis and identify potential causes of such a limited diversity, which motivates us to devise a new method that
We translate the diversity of augmented text prompts to the resulting 3D models via a two-stage method. Stage 1: HiPer tokens inversion (left): for each reference image, we seek to learn a HiPer token $h_i$ so that the prompt $[y; h_i]$ reconstructs the reference image. Stage 2: Textual score distillation (right): we run a multi-particle variational inference for optimizing the 3D models from text prompt $y$. For each iteration in the optimization, we randomly sample a particle $\theta_i$ with its rendered