adirik / t2i-adapter-sdxl-canny

Modify images using canny edges

  • Public
  • 20.2K runs
  • GitHub
  • Paper
  • License



Run time and cost

This model runs on Nvidia A40 GPU hardware. Predictions typically complete within 16 seconds. The predict time for this model varies significantly based on the inputs.


Model Description

T2I-Adapter based on Stable Diffusion-XL by Tencent ARC Lab and Peking University VILLA. Cog-wrapper is adapted from the [official repository.]( ( T2I-Adapter performs image editing using text prompts combined with depth map, human body pose, line art, canny edge and sketch conditions.

Abstract: We propose T2I-Adapter, a simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.

T2I-Adapter aligns internal knowledge in T2I models with external control signals. We can train various adapters according to different conditions, and achieve rich control and editing effects.

See the paper, official repository and Hugging Face model page and demo for more information.


To start, upload an image you would like to modify and prompt the model to generate an image as you would for Stable Diffusion. The model generating the image will use your input image as a template and internally perform canny edge detection to guide image generation.

Other T2I-Adapter Models

There are many different ways to use a T2I-Adapter to modify the output of Stable Diffusion XL and Stable Diffusion. Here are a few different options, all of which use an input image in addition to a prompt to generate an output. The methods process the input in different ways; try them out to see which works best for a given application.

T2I-Adapter for generating images from sketches

T2I-Adapter for generating humans based on input image

T2I-Adapter for preserving general qualities about an input image

T2I-Adapter SD


  title={T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models},
  author={Mou, Chong and Wang, Xintao and Xie, Liangbin and Wu, Yanze and Zhang, Jian and Qi, Zhongang and Shan, Ying and Qie, Xiaohu},
  journal={arXiv preprint arXiv:2302.08453},