gligen.github.io - GLIGEN:Open-Set Grounded Text-to-Image Generation.

Description: GLIGEN:Open-Set Grounded Text-to-Image Generation.

diffusion (260) grounding (71) image generation (34)

Example domain paragraphs

Figure 1. GLIGEN enables versatile grounding capabilities for a frozen text-to-image generation model.

Large-scale text-to-image diffusion models have made amazing advances. However, the status quo is to use text input alone, which can impede controllability. In this work, we propose GLIGEN, G rounded- L anguage-to- I mage Gen eration, a novel approach that builds upon and extends the functionality of existing pre-trained text-to-image diffusion models by enabling them to also be conditioned on grounding inputs . To preserve the vast concept knowledge of the pre-trained model, we freeze all of its weights an

Figure 2. Gated Self-Attention is used to fuse new grounding tokens.

Links to gligen.github.io (15)

chunyuan.li Chunyuan Li
hliu.cc Haotian Liu
yuheng-li.github.io Yuheng Li
photoswap.github.io PHOTOSWAP: Personalized Subject Swapping in Images
vitron-llm.github.io Vitron
layoutgpt.github.io LayoutGPT
swap-anything.github.io SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing
re-ground.github.io ReGround: Improving Textual and Spatial Grounding at No Cost
dpt-t2i.github.io DPT-T2I
mmworld-bench.github.io MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
countercurate.github.io CounterCurate
layoutllm-t2i.github.io LayoutLLM-T2I
instruction-tuning-with-gpt-4.github.io Instruction Tuning with GPT-4
screen-point-and-read.github.io Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
codeiforme.com • Curated knowledge about art and AI •