pharmapsychotic / clip-interrogator

The CLIP Interrogator is a prompt engineering tool that combines OpenAI's CLIP and Salesforce's BLIP to optimize text prompts to match a given image. Use the resulting prompts with text-to-image models like Stable Diffusion to create cool art!

  • Public
  • 3.2M runs
  • T4
  • GitHub
  • License

Input

image
*file

Input image

string

Choose ViT-L for Stable Diffusion 1, ViT-H for Stable Diffusion 2, or ViT-bigG for Stable Diffusion XL.

Default: "ViT-L-14/openai"

string

Prompt mode (best takes 10-20 seconds, fast takes 1-2 seconds).

Default: "best"

Output

a watercolor painting of a sea turtle, a digital painting, by Kubisi art, featured on dribbble, medibang, warm saturated palette, red and green tones, turquoise horizon, digital art h 9 6 0, detailed scenery —width 672, illustration:.4, spray art, artstatiom
Generated in

This example was created by a different version, pharmapsychotic/clip-interrogator:41fdb702.

Run time and cost

This model costs approximately $0.00068 to run on Replicate, or 1470 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 4 seconds.

Readme

The CLIP Interrogator uses the OpenAI CLIP models to test a given image against a variety of artists, mediums, and styles to study how the different models see the content of the image. It also combines the results with BLIP caption to suggest a text prompt to create more images similar to what was given.