laion-ai / erlich

Generate a logo using text.

  • Public
  • 348.9K runs
  • T4
  • GitHub
  • License

Input

string
Shift + Return to add a new line

Your text prompt.

Default: ""

string
Shift + Return to add a new line

(optional) Negate the model's prediction for this text from the model's prediction for the target text.

Default: ""

file

(optional) Initial image to use for the model's prediction. If provided alongside a mask, the image will be inpainted instead.

file

a mask image for inpainting an init_image. white pixels = keep, black pixels = discard. resized to width = image width/8, height = image height/8

number
(minimum: -20, maximum: 100)

Classifier-free guidance scale. Higher values will result in more guidance toward caption, with diminishing returns. Try values between 1.0 and 40.0. In general, going above 5.0 will introduce some artifacting.

Default: 5

integer
(minimum: 15, maximum: 250)

Number of diffusion steps to run. Due to PLMS sampling, using more than 100 steps is unnecessary and may simply produce the exact same output.

Default: 50

integer
(minimum: 1, maximum: 16)

Batch size. (higher = slower)

Default: 4

integer

Target width

Default: 256

integer

Target height

Default: 256

number
(minimum: 0, maximum: 1)

Fraction of sampling steps to skip when using an init image. Defaults to 0.0 if init_image is not specified and 0.5 if init_image is specified.

Default: 0

integer

Aesthetic rating (1-9) - embed to use.

Default: 9

number

Aesthetic weight (0-1). How much to guide towards the aesthetic embed vs the prompt embed.

Default: 0.5

integer
(minimum: -1, maximum: 4294967295)

Seed for random number generator. If -1, a random seed will be chosen.

Default: -1

boolean

Whether to return intermediate outputs. Enable to visualize the diffusion process and/or debug the model. May slow down inference.

Default: false

Output

outputoutputoutputoutputoutputoutput
Generated in

Run time and cost

This model costs approximately $0.062 to run on Replicate, or 16 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 5 minutes. The predict time for this model varies significantly based on the inputs.

Readme

erlich is the text2image latent diffusion model from CompVis (with additions from glid-3-xl) finetuned on a dataset collected from LAION-5B named Large Logo Dataset. It consists of roughly 100K images of logos with captions generated via BLIP using aggressive re-ranking.

For more info see the README.md for ldm-finetune.