Learning Adapters towards Controllable for Text-to-Image Diffusion Models

Run time and cost

Predictions run on Nvidia A100 GPU hardware. Predictions typically complete within 13 seconds. The predict time for this model varies significantly based on the inputs.

Official implementation of T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models.

We propose T2I-Adapter, a simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.

T2I-Adapter aligns internal knowledge in T2I models with external control signals.
We can train various adapters according to different conditions, and achieve rich control and editing effects.