adirik/t2i-adapter-sdxl-depth-midas | Run with an API on Replicate

Examples

Run time and cost

This model costs approximately $0.043 to run on Replicate, or 23 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 45 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Model Description

T2I-Adapter based on Stable Diffusion-XL by Tencent ARC Lab and Peking University VILLA. Cog-wrapper is adapted from the [official repository.]( (https://github.com/TencentARC/T2I-Adapter). T2I-Adapter performs image editing using text prompts combined with depth map, human body pose, line art, canny edge and sketch conditions.

Abstract: We propose T2I-Adapter, a simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.

T2I-Adapter aligns internal knowledge in T2I models with external control signals. We can train various adapters according to different conditions, and achieve rich control and editing effects.

See the paper, official repository and Hugging Face model page and demo for more information.

Usage

To start, upload an image you would like to modify and prompt the model to generate an image as you would for Stable Diffusion. The model generating the image will use your input image as a template and internally create a depth map to guide image generation.

Other T2I-Adapter Models

There are many different ways to use a T2I-Adapter to modify the output of Stable Diffusion XL and Stable Diffusion. Here are a few different options, all of which use an input image in addition to a prompt to generate an output. The methods process the input in different ways; try them out to see which works best for a given application.

T2I-Adapter for generating images from sketches
https://replicate.com/alaradirik/t2i-adapter-sdxl-sketch

T2I-Adapter for generating humans based on input image
https://replicate.com/alaradirik/t2i-adapter-sdxl-openpose

T2I-Adapter for preserving general qualities about an input image
https://replicate.com/alaradirik/t2i-adapter-sdxl-canny https://replicate.com/alaradirik/t2i-adapter-sdxl-lineart

T2I-Adapter SD
https://replicate.com/cjwbw/t2i-adapter

Citation

@article{mou2023t2i,
  title={T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models},
  author={Mou, Chong and Wang, Xintao and Xie, Liangbin and Wu, Yanze and Zhang, Jian and Qi, Zhongang and Shan, Ying and Qie, Xiaohu},
  journal={arXiv preprint arXiv:2302.08453},
  year={2023}
}