tuning-encoder.github.io - Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Description: Encoder for Fast Personalization of Text-to-Image Models

text-to-image (42) textual inversion (6) personalized generation (4)

Example domain paragraphs

Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle with lengthy training times, high storage requirements or loss of identity. To overcome these limitations, we propose an encoder-based domain-tuning approach. Our key insight is that by underfitting on a large set of concepts from a given domain, we can improve generaliz

We propose a two-component method for fast personalization of text-to-image diffusion models. First, a domain-specific encoder that learns to quickly map images into word-embeddings that represent them. Two, a set of weight-offsets that draw the diffusion model towards the same domain, allowing for easier personalization to novel concepts from this domain. We pre-train these components on a large dataset from the given domain. At inference time, we can use them to guide optimization for a specific concept,

The result is a tuning-approach that requires as few as 5 training steps in order to personalize the diffusion model, reducing optimization times from dozens of minutes to a few seconds. This puts personalization times in-line with the time it takes to generate a batch of images, eliminating the need to save a model for every new concept.

Links to tuning-encoder.github.io (3)

rinongal.github.io Rinon Gal
moabarar.github.io Moab Arar
lcm-lookahead.github.io LCM-Lookahead for Encoder-based Text-to-Image Personalization