Description: Towards Flexible, Scalable, and Adaptive Multi-Modal Conditioned Face Synthesis
nerf (195) d-nerf (90) nerfies (89)
Our method's versatile synthesis capabilities, demonstrating high-fidelity facial image generation from a flexible combination of modalities including mask, text, sketch, lighting, expression, pose, and low-resolution images. Remarkably, these diverse face synthesis tasks are achieved within a single sampling process of a unified diffusion U-Net, demonstrating the method's efficiency and the seamless integration of multi-modal information. Abstract Recent progress in multi-modal conditioned face synthesis h
To address these challenges, we introduce a novel uni-modal training approach with modal surrogates, coupled with an entropy-aware modal-adaptive modulation, to support flexible, scalable, and scalable multi-modal conditioned face synthesis network. Our uni-modal training with modal surrogate that only leverage uni-modal data, use modal surrogate to decorate condition with modal-specific characteristic and serve as linker for inter-modal collaboration , fully learns each modality control in face synthesis p
Uni-modal training with modal surrogate and entropy-aware modal-adaptive modulation mechanism. During training, we randomly sample uni-modal data, of which the condition is fused with its modal surrogate to learn modal-specific intrinsic and other modal surrogate to learn inter-modal collaboration. The fused features are sent to the diffusion U-Net to guide the de-noising process of the corrupted input image. The output noise is further modulated according to the condition features and UNet feature to adapt