zsxkib / wd-image-tagger

Image tagger fine-tuned on WaifuDiffusion w/ (SwinV2, SwinV2, ConvNext, and ViT)

  • Public
  • 1K runs
  • T4
  • GitHub
  • License

Input

image
*file

Path to the input image file to be analyzed by the WaifuDiffusion tagger model

string

Name of the pre-trained model repository to use for image analysis

Default: "models/wd-swinv2-tagger-v3"

number
(minimum: 0, maximum: 1)

Probability threshold for including general tags in the output (between 0 and 1)

Default: 0.35

boolean

Whether to use the MCut algorithm to automatically determine the general tags threshold

Default: false

number
(minimum: 0, maximum: 1)

Probability threshold for including character tags in the output (between 0 and 1)

Default: 0.85

boolean

Whether to use the MCut algorithm to automatically determine the character tags threshold

Default: false

string

Category of tags to return in the output

Default: "all_tags"

Output

tag

outdoors

category

general

confidence

0.5684077143669128

tag

lying

category

general

confidence

0.3831140398979187

tag

blurry

category

general

confidence

0.8763853311538696

tag

no humans

category

general

confidence

0.9804705381393433

tag

blurry background

category

general

confidence

0.5741121172904968

tag

depth of field

category

general

confidence

0.5820473432540894

tag

animal

category

general

confidence

0.8318572640419006

tag

cat

category

general

confidence

0.6375629305839539

tag

grass

category

general

confidence

0.5245271325111389

tag

realistic

category

general

confidence

0.5816801190376282

tag

animal focus

category

general

confidence

0.9442822933197021

tag

white fur

category

general

confidence

0.36753329634666443

tag

whiskers

category

general

confidence

0.551397442817688

tag

general

category

rating

confidence

0.983599066734314

tag

sensitive

category

rating

confidence

0.015072911977767944

tag

questionable

category

rating

confidence

0.000958561897277832

tag

explicit

category

rating

confidence

0.000261843204498291
Generated in

Run time and cost

This model costs approximately $0.00050 to run on Replicate, or 2000 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 3 seconds.

Readme

WD Image Tagger 🏷️🖼️

Replicate - WD Image Tagger

The WD Image Tagger is a powerful AI model that automatically analyzes and tags your images with descriptive labels. It’s trained on a large dataset of anime-style images and can recognize a wide range of content, including general attributes, characters, and age ratings.

This tool was developed using resources and models available on SmilingWolf’s wd-tagger Hugging Face Space, ensuring state-of-the-art performance and ease of use.

Whether you’re managing a large image library, looking to generate accurate prompts for an AI art model, or want to quickly filter out potentially sensitive content, the WD Image Tagger can help streamline your workflow.

  • wd-swinv2-tagger-v3
  • wd-convnext-tagger-v3
  • wd-vit-tagger-v3
  • wd-v1-4-moat-tagger-v2
  • wd-v1-4-swinv2-tagger-v2
  • wd-v1-4-convnext-tagger-v2
  • wd-v1-4-convnextv2-tagger-v2
  • wd-v1-4-vit-tagger-v2

Features

  • 🌟 Pre-trained on a diverse dataset of anime images
  • 🏷️ Tags images with general attributes, characters, and content ratings
  • 🔍 Supports multiple state-of-the-art model architectures like SwinV2, ConvNext, and ViT
  • ⚙️ Adjustable tag probability thresholds for fine-grained control over results
  • 🧮 Optional MCut algorithm for automatic threshold optimization
  • 🗂️ Filter tags by category to focus on what’s most relevant to you
  • 🔌 Easy integration into existing applications via a simple API

Getting Started

To start tagging your images with the WD Image Tagger:

  1. Upload your image
  2. Select the pre-trained model you’d like to use
  3. Adjust the tag probability thresholds and category filters as needed
  4. Let the model analyze your image and output the relevant tags

The model will return a list of tags, each with a confidence score and category label (general, character, or rating).

Pre-trained Models

The WD Image Tagger comes with several pre-trained model options, each with its own strengths:

  • SwinV2: A powerful and accurate model architecture well-suited for most use cases
  • ConvNext: An efficient model that offers a good balance of speed and accuracy
  • ViT (Vision Transformer): A transformer-based model that excels at capturing global context

Models are provided in both the latest Dataset v3 series and the earlier v2 series. The v3 models were trained on a larger and more diverse dataset, while the v2 models offer compatibility with older workflows.

Acknowledgments

The WD Image Tagger was trained using the SW-CV-ModelZoo toolkit, with TPUs generously provided by the TRC program. Special thanks to the researchers and engineers who made this powerful tool possible!

Learn More

For more technical details on the available models and their expected performance, check out the WD Image Tagger GitHub repository.


We hope the WD Image Tagger helps make your image analysis workflows faster and more effective. If you have any questions or feedback, don’t hesitate to reach out!