kvfrans / clipdraw

Synthesize drawings to match a text prompt

  • Public
  • 5.5K runs
  • GitHub
  • Paper

Run time and cost

This model costs approximately $0.094 to run on Replicate, or 10 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 7 minutes. The predict time for this model varies significantly based on the inputs.

Readme

CLIPDraw: Synthesize drawings to match a text prompt!

This work presents CLIPDraw, an algorithm that synthesizes novel drawings based on natural language input. CLIPDraw does not require any training; rather a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, a constraint that biases drawings towards simpler human-recognizable shapes. Results compare between CLIPDraw and other synthesis-through-optimization methods, as well as highlight various interesting behaviors of CLIPDraw, such as satisfying ambiguous text in multiple ways, reliably producing drawings in diverse artistic styles, and scaling from simple to complex visual representations as stroke count is increased.