Run time and cost

This model costs approximately $0.00030 to run on Replicate, or 3333 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 2 seconds.

Readme

WD Image Tagger 🏷️🖼️

The WD Image Tagger is a powerful AI model that automatically analyzes and tags your images with descriptive labels. It’s trained on a large dataset of anime-style images and can recognize a wide range of content, including general attributes, characters, and age ratings.

This tool was developed using resources and models available on SmilingWolf’s wd-tagger Hugging Face Space, ensuring state-of-the-art performance and ease of use.

Whether you’re managing a large image library, looking to generate accurate prompts for an AI art model, or want to quickly filter out potentially sensitive content, the WD Image Tagger can help streamline your workflow.

wd-swinv2-tagger-v3
wd-convnext-tagger-v3
wd-vit-tagger-v3
wd-v1-4-moat-tagger-v2
wd-v1-4-swinv2-tagger-v2
wd-v1-4-convnext-tagger-v2
wd-v1-4-convnextv2-tagger-v2
wd-v1-4-vit-tagger-v2

Features

🌟 Pre-trained on a diverse dataset of anime images
🏷️ Tags images with general attributes, characters, and content ratings
🔍 Supports multiple state-of-the-art model architectures like SwinV2, ConvNext, and ViT
⚙️ Adjustable tag probability thresholds for fine-grained control over results
🧮 Optional MCut algorithm for automatic threshold optimization
🗂️ Filter tags by category to focus on what’s most relevant to you
🔌 Easy integration into existing applications via a simple API

Getting Started

To start tagging your images with the WD Image Tagger:

Upload your image
Select the pre-trained model you’d like to use
Adjust the tag probability thresholds and category filters as needed
Let the model analyze your image and output the relevant tags

The model will return a list of tags, each with a confidence score and category label (general, character, or rating).

Pre-trained Models

The WD Image Tagger comes with several pre-trained model options, each with its own strengths:

SwinV2: A powerful and accurate model architecture well-suited for most use cases
ConvNext: An efficient model that offers a good balance of speed and accuracy
ViT (Vision Transformer): A transformer-based model that excels at capturing global context

Models are provided in both the latest Dataset v3 series and the earlier v2 series. The v3 models were trained on a larger and more diverse dataset, while the v2 models offer compatibility with older workflows.

Acknowledgments

The WD Image Tagger was trained using the SW-CV-ModelZoo toolkit, with TPUs generously provided by the TRC program. Special thanks to the researchers and engineers who made this powerful tool possible!

Learn More

For more technical details on the available models and their expected performance, check out the WD Image Tagger GitHub repository.

We hope the WD Image Tagger helps make your image analysis workflows faster and more effective. If you have any questions or feedback, don’t hesitate to reach out!