gligen.github.io - GLIGEN:Open-Set Grounded Text-to-Image Generation.

Description: GLIGEN:Open-Set Grounded Text-to-Image Generation.

diffusion (260) grounding (71) image generation (34)

Example domain paragraphs

Figure 1. GLIGEN enables versatile grounding capabilities for a frozen text-to-image generation model.

Large-scale text-to-image diffusion models have made amazing advances. However, the status quo is to use text input alone, which can impede controllability. In this work, we propose GLIGEN, G rounded- L anguage-to- I mage Gen eration, a novel approach that builds upon and extends the functionality of existing pre-trained text-to-image diffusion models by enabling them to also be conditioned on grounding inputs . To preserve the vast concept knowledge of the pre-trained model, we freeze all of its weights an

Figure 2. Gated Self-Attention is used to fuse new grounding tokens.

Links to gligen.github.io (15)