kvfrans / clipdraw

Synthesize drawings to match a text prompt

  • Public
  • 5.6K runs
  • T4
  • GitHub
  • Paper

Input

string
Shift + Return to add a new line

prompt for generating image

Default: "Watercolor painting of an underwater submarine."

integer

number of paths/curves

Default: 256

integer

number of iterations

Default: 1000

integer

display frequency of intermediate images

Default: 10

Output

file

This output was created using a different version of the model, kvfrans/clipdraw:1bbb18ae.

Run time and cost

This model costs approximately $0.094 to run on Replicate, or 10 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 7 minutes. The predict time for this model varies significantly based on the inputs.

Readme

CLIPDraw: Synthesize drawings to match a text prompt!

This work presents CLIPDraw, an algorithm that synthesizes novel drawings based on natural language input. CLIPDraw does not require any training; rather a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, a constraint that biases drawings towards simpler human-recognizable shapes. Results compare between CLIPDraw and other synthesis-through-optimization methods, as well as highlight various interesting behaviors of CLIPDraw, such as satisfying ambiguous text in multiple ways, reliably producing drawings in diverse artistic styles, and scaling from simple to complex visual representations as stroke count is increased.