mbukerepo / photomaker

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

  • Public
  • 3.5K runs
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 22 seconds. The predict time for this model varies significantly based on the inputs.

Readme

PhotoMaker

Customizing Realistic Human Photos via Stacked ID Embedding .

Usage

Users can input one or a few face photos, along with a text prompt, to receive a customized photo or painting within seconds (no training required!). Additionally, this model can be adapted to any base model based on SDXL or used in conjunction with other LoRA modules.

Realistic results

image/jpeg

image/jpeg

Stylization results

image/jpeg

image/jpeg

More results can be found in our project page

Model Details

It mainly contains two parts corresponding to two keys in loaded state dict:

  1. id_encoder includes finetuned OpenCLIP-ViT-H-14 and a few fuse layers.

  2. lora_weights applies to all attention layers in the UNet, and the rank is set to 64.