Monkey Island Stable Diffusion

Stable Diffusion 1.5 fine-tuned on frames from Monkey Island 1 and 2.

Dataset

Frames were extracted from https://www.youtube.com/watch?v=QgRIXntFhww and https://www.youtube.com/watch?v=RkpG4FoszBQ.

Each frame was then captioned with https://replicate.com/salesforce/blip and the prefix "Caption: " was stripped out.

Fine-tuning

First, run cog run script/download-weights to download the SD 1.5 weights.

The model was fine-tuned on a customized version of the Dreambooth training script in Diffusers.

Training parameters:

python train_dreambooth.py \
    --pretrained_model_name_or_path=diffusers-cache/models--runwayml--stable-diffusion-v1-5/snapshots/51b78538d58bd5526f1cf8e7c03c36e2799e0178 \
    --instance_data_dir=dataset \
    --output_dir=dreambooth-output \
    --resolution=144 \
    --train_batch_size=16 \
    --gradient_accumulation_steps=1 \
    --learning_rate=2e-6 \
    --lr_scheduler="constant" \
    --lr_warmup_steps=0 \
    --max_train_steps=1600 \
    --use_8bit_adam \
    --mixed_precision=fp16 \
    --class_data_dir=class-data-dir \
    --with_prior_preservation \
    --num_class_images=2000 \
    --train_text_encoder

There’s still room for improvement. It doesn’t always get the prompt right, especially if there are multiple concepts in the prompt. There’s some degree of catastrophic forgetting going on.

Model created over 1 year ago