mbukerepo / photomaker

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

  • Public
  • 4.8K runs
  • L40S
  • Paper
  • License

Input

*file

The input image, for example a photo of your face.

file

Additional input image (optional)

file

Additional input image (optional)

file

Additional input image (optional)

string
Shift + Return to add a new line

Prompt. Example: 'a photo of a man/woman img'. The phrase 'img' is the trigger word.

Default: "A photo of a person img"

string

Style template. The style template will add a style-specific prompt and negative prompt to the user's prompt.

Default: "Photographic (Default)"

string
Shift + Return to add a new line

Negative Prompt. The negative prompt should NOT contain the trigger word.

Default: "nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"

integer
(minimum: 1, maximum: 100)

Number of sample steps

Default: 20

number
(minimum: 15, maximum: 50)

Style strength (%)

Default: 20

integer
(minimum: 1, maximum: 4)

Number of output images

Default: 1

number
(minimum: 1, maximum: 10)

Guidance scale. A guidance scale of 1 corresponds to doing no classifier free guidance.

Default: 5

integer
(minimum: 0, maximum: 2147483647)

Seed. Leave blank to use a random number

boolean

This model’s safety checker can’t be disabled when running on the website. Learn more about platform safety on Replicate.

Disable safety checker for generated images.

Default: false

Output

output
Generated in

This output was created using a different version of the model, mbukerepo/photomaker:3be42163.

Run time and cost

This model costs approximately $0.035 to run on Replicate, or 28 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 37 seconds.

Readme

PhotoMaker

Customizing Realistic Human Photos via Stacked ID Embedding .

Usage

Users can input one or a few face photos, along with a text prompt, to receive a customized photo or painting within seconds (no training required!). Additionally, this model can be adapted to any base model based on SDXL or used in conjunction with other LoRA modules.

Realistic results

image/jpeg

image/jpeg

Stylization results

image/jpeg

image/jpeg

More results can be found in our project page

Model Details

It mainly contains two parts corresponding to two keys in loaded state dict:

  1. id_encoder includes finetuned OpenCLIP-ViT-H-14 and a few fuse layers.

  2. lora_weights applies to all attention layers in the UNet, and the rank is set to 64.