andreasjansson / flash-eval

Suite of models to evaluate the image quality of text-to-image models with respect to their input prompts.

  • Public
  • 1.9K runs
  • A100 (80GB)
  • GitHub
  • Paper
Run with an API

Input

*string
Shift + Return to add a new line

Newline-separated list of prompt/image URL pairs. Prompt/image URL pairs are formatted as <prompt>:<image1>[,<image2>[,<image3>[,...]]], i.e. a prompt followed by a colon followed by one or more comma-separated image URLs.

string
Shift + Return to add a new line

Comma-separated list of models to use to evaluate. Valid models are ImageReward, Aesthetic, CLIP, BLIP, PickScore

Default: "ImageReward,Aesthetic,CLIP,BLIP,PickScore"

string
Shift + Return to add a new line

Separator between prompt and list of images

Default: ":"

string
Shift + Return to add a new line

Separator between image URLs

Default: ","

Output

{ "a cat": { "https://replicate.delivery/yhqm/M3MzBpeWPfg0qkNZcK1x4dMXr2boczHOqTHsxnEjtauJAvkTA/out-0.png": { "BLIP": 0.4326244592666626, "CLIP": 0.2218017578125, "Aesthetic": 6.099793434143066, "PickScore": 21.34879493713379, "ImageReward": 0.3194262385368347 }, "https://replicate.delivery/czjl/49VkOf3fu7sfgon0W2OMf5dlIxiwUpsBAicD1lveJP0e2cL5E/output.webp": { "BLIP": 0.35405826568603516, "CLIP": 0.245849609375, "Aesthetic": 5.96907901763916, "PickScore": 21.581308364868164, "ImageReward": 0.637330174446106 } }, "a dog": { "https://replicate.delivery/yhqm/yzKdMfHFMeq14kceu334KavdzLMcETTHiC12E2iMkfNgS8SOB/out-0.png": { "BLIP": 0.35565632581710815, "CLIP": 0.181884765625, "Aesthetic": 5.137001991271973, "PickScore": 19.575040817260742, "ImageReward": -0.915699303150177 }, "https://replicate.delivery/czjl/c2e693uAGuWIe0o6ZOqFF3R5uJt5hq7SPSsrkgfA6dz1JeSOB/output.webp": { "BLIP": 0.399705708026886, "CLIP": 0.2403564453125, "Aesthetic": 6.546636581420898, "PickScore": 21.414165496826172, "ImageReward": 1.6094799041748047 } } }
Generated in

This output was created using a different version of the model, andreasjansson/flash-eval:40774574.

Run time and cost

This model costs approximately $0.37 to run on Replicate, or 2 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 5 minutes. The predict time for this model varies significantly based on the inputs.

Readme

A suite of models to evaluate the image quality of text-to-image models with respect to their input prompts.

Github repo: https://github.com/thu-nics/FlashEval

Models

CLIP

  • Measures text-image alignment using CLIP. It evaluates how well the image matches the given text prompt.
  • Link

BLIP

  • Evaluates text-to-image alignment using BLIP, assessing how well the image matches the text description.
  • Link

Aesthetic

  • Assesses the aesthetic quality of an image, predicting how visually appealing it is to humans.
  • Link

ImageReward

  • A human preference model that predicts which images humans would prefer based on the given prompt.
  • Link

PickScore

  • A human preference model that predicts which images humans would prefer based on the given prompt.
  • Link