andreasjansson / monkey-island-sd

Stable Diffusion fined-tuned on frames from Monkey Island 1 and 2

  • Public
  • 2.4K runs
  • GitHub
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 GPU hardware. Predictions typically complete within 5 seconds.

Readme

Monkey Island Stable Diffusion

Stable Diffusion 1.5 fine-tuned on frames from Monkey Island 1 and 2.

Dataset

Frames were extracted from https://www.youtube.com/watch?v=QgRIXntFhww and https://www.youtube.com/watch?v=RkpG4FoszBQ.

Each frame was then captioned with https://replicate.com/salesforce/blip and the prefix "Caption: " was stripped out.

Fine-tuning

First, run cog run script/download-weights to download the SD 1.5 weights.

The model was fine-tuned on a customized version of the Dreambooth training script in Diffusers.

Training parameters:

python train_dreambooth.py \
    --pretrained_model_name_or_path=diffusers-cache/models--runwayml--stable-diffusion-v1-5/snapshots/51b78538d58bd5526f1cf8e7c03c36e2799e0178 \
    --instance_data_dir=dataset \
    --output_dir=dreambooth-output \
    --resolution=144 \
    --train_batch_size=16 \
    --gradient_accumulation_steps=1 \
    --learning_rate=2e-6 \
    --lr_scheduler="constant" \
    --lr_warmup_steps=0 \
    --max_train_steps=1600 \
    --use_8bit_adam \
    --mixed_precision=fp16 \
    --class_data_dir=class-data-dir \
    --with_prior_preservation \
    --num_class_images=2000 \
    --train_text_encoder

There’s still room for improvement. It doesn’t always get the prompt right, especially if there are multiple concepts in the prompt. There’s some degree of catastrophic forgetting going on.