adirik / t2i-adapter-sdxl-openpose

Modify images using human pose

  • Public
  • 61.7K runs
  • L40S
  • GitHub
  • Paper
  • License

Input

image
*file

Input image

string
Shift + Return to add a new line

Input prompt

Default: "A couple, 4k photo, highly detailed"

string
Shift + Return to add a new line

Specify things to not see in the output

Default: "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"

integer
(minimum: 0, maximum: 100)

Number of diffusion steps

Default: 30

number
(minimum: 0, maximum: 5)

Conditioning scale

Default: 1

number
(minimum: 0, maximum: 1)

Factor to scale image by

Default: 1

number
(minimum: 0, maximum: 10)

Guidance scale to match the prompt

Default: 7.5

integer
(minimum: 1, maximum: 4)

Number of outputs to generate

Default: 1

string

Which scheduler to use

Default: "K_EULER_ANCESTRAL"

integer

Random seed for reproducibility, leave blank to randomize output

Output

outputoutput
Generated in

This output was created using a different version of the model, adirik/t2i-adapter-sdxl-openpose:6e87d127.

Run time and cost

This model costs approximately $0.039 to run on Replicate, or 25 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 40 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Model Description

T2I-Adapter based on Stable Diffusion-XL by Tencent ARC Lab and Peking University VILLA. Cog-wrapper is adapted from the [official repository.]( (https://github.com/TencentARC/T2I-Adapter). T2I-Adapter performs image editing using text prompts combined with depth map, human body pose, line art, canny edge and sketch conditions.

Abstract: We propose T2I-Adapter, a simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.

T2I-Adapter aligns internal knowledge in T2I models with external control signals. We can train various adapters according to different conditions, and achieve rich control and editing effects.

See the paper, official repository and Hugging Face model page and demo for more information.

Usage

To start, upload an image you would like to modify and prompt the model to generate an image as you would for Stable Diffusion. The model generating the image will use your input image as a template and internally detect the body pose to guide image generation.

Other T2I-Adapter Models

There are many different ways to use a T2I-Adapter to modify the output of Stable Diffusion XL and Stable Diffusion. Here are a few different options, all of which use an input image in addition to a prompt to generate an output. The methods process the input in different ways; try them out to see which works best for a given application.

T2I-Adapter for generating images from sketches
https://replicate.com/alaradirik/t2i-adapter-sdxl-sketch

T2I-Adapter for preserving general qualities about an input image
https://replicate.com/alaradirik/t2i-adapter-sdxl-canny https://replicate.com/alaradirik/t2i-adapter-sdxl-lineart https://replicate.com/alaradirik/t2i-adapter-sdxl-depth-midas

T2I-Adapter SD
https://replicate.com/cjwbw/t2i-adapter

Citation

@article{mou2023t2i,
  title={T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models},
  author={Mou, Chong and Wang, Xintao and Xie, Liangbin and Wu, Yanze and Zhang, Jian and Qi, Zhongang and Shan, Ying and Qie, Xiaohu},
  journal={arXiv preprint arXiv:2302.08453},
  year={2023}
}