playgroundai / playground-v2.5-1024px-aesthetic

Playground v2.5 is the state-of-the-art open-source model in aesthetic quality

  • Public
  • 2.2M runs
  • A100 (80GB)
  • GitHub
  • Weights
  • Paper
  • License

Input

string
Shift + Return to add a new line

Input prompt

Default: "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

string
Shift + Return to add a new line

Negative Input prompt

Default: "ugly, deformed, noisy, blurry, distorted"

file

Input image for img2img or inpaint mode

file

Input mask for inpaint mode. Black areas will be preserved, white areas will be inpainted.

integer
(minimum: 256, maximum: 1536)

Width of output image

Default: 1024

integer
(minimum: 256, maximum: 1536)

Height of output image

Default: 1024

integer
(minimum: 1, maximum: 4)

Number of images to output.

Default: 1

string

Scheduler. DPMSolver++ or DPM++2MKarras is recommended for most cases

Default: "DPMSolver++"

integer
(minimum: 1, maximum: 60)

Number of denoising steps

Default: 25

number
(minimum: 0.1, maximum: 20)

Scale for classifier-free guidance

Default: 3

number
(minimum: 0, maximum: 1)

Prompt strength when using img2img / inpaint. 1.0 corresponds to full destruction of information in image

Default: 0.8

integer

Random seed. Leave blank to randomize the seed

boolean

Applies a watermark to enable determining if an image is generated in downstream applications. If you have other provisions for generating or deploying images safely, you can use this to disable watermarking.

Default: true

boolean

This model’s safety checker can’t be disabled when running on the website. Learn more about platform safety on Replicate.

Disable safety checker for generated images. This feature is only available through the API. See https://replicate.com/docs/how-does-replicate-work#safety

Default: false

Output

output
Generated in

Run time and cost

This model costs approximately $0.066 to run on Replicate, or 15 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 48 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Playground v2.5 – 1024px Aesthetic Model

This repository contains a model that generates highly aesthetic images of resolution 1024x1024, as well as portrait and landscape aspect ratios. You can use the model with Hugging Face 🧨 Diffusers.

image/png

Playground v2.5 is a diffusion-based text-to-image generative model, and a successor to Playground v2.

Playground v2.5 is the state-of-the-art open-source model in aesthetic quality. Our user studies demonstrate that our model outperforms SDXL, Playground v2, PixArt-α, DALL-E 3, and Midjourney 5.2.

For details on the development and training of our model, please refer to our blog post and technical report.

Model Description

  • Developed by: Playground
  • Model type: Diffusion-based text-to-image generative model
  • License: Playground v2.5 Community License
  • Summary: This model generates images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pre-trained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). It follows the same architecture as Stable Diffusion XL.

Using the model with 🧨 Diffusers

Install diffusers >= 0.27.0 and the relevant dependencies. For now, you need to install from the main diffusers branch in GitHub until a new release is published in PyPi.

Notes: - The pipeline uses the EDMDPMSolverMultistepScheduler scheduler by default, for crisper fine details. It’s an EDM formulation of the DPM++ 2M Karras scheduler. guidance_scale=3.0 is a good default for this scheduler. - The pipeline also supports the EDMEulerScheduler scheduler. It’s an EDM formulation of the Euler scheduler. guidance_scale=5.0 is a good default for this scheduler.

Using the model with Automatic1111/ComfyUI

Support coming soon. We will update this model card with instructions when ready.

User Studies

This model card only provides a brief summary of our user study results. For extensive details on how we perform user studies, please check out our technical report.

We conducted studies to measure overall aesthetic quality, as well as for the specific areas we aimed to improve with Playground v2.5, namely multi aspect ratios and human preference alignment.

Comparison to State-of-the-Art

image/png

The aesthetic quality of Playground v2.5 dramatically outperforms the current state-of-the-art open source models SDXL and PIXART-α, as well as Playground v2. Because the performance differential between Playground V2.5 and SDXL was so large, we also tested our aesthetic quality against world-class closed-source models like DALL-E 3 and Midjourney 5.2, and found that Playground v2.5 outperforms them as well.

Multi Aspect Ratios

image/png

Similarly, for multi aspect ratios, we outperform SDXL by a large margin.

image/png

Next, we benchmark Playground v2.5 specifically on people-related images, to test Human Preference Alignment. We compared Playground v2.5 against two commonly-used baseline models: SDXL and RealStock v2, a community fine-tune of SDXL that was trained on a realistic people dataset.

Playground v2.5 outperforms both baselines by a large margin.

MJHQ-30K Benchmark

image/png

Model Overall FID
SDXL-1-0-refiner 9.55
playground-v2-1024px-aesthetic 7.07
playground-v2.5-1024px-aesthetic 4.48

Lastly, we report metrics using our MJHQ-30K benchmark which we open-sourced with the v2 release. We report both the overall FID and per category FID. All FID metrics are computed at resolution 1024x1024. Our results show that Playground v2.5 outperforms both Playground v2 and SDXL in overall FID and all category FIDs, especially in the people and fashion categories. This is in line with the results of the user study, which indicates a correlation between human preferences and the FID score of the MJHQ-30K benchmark.