These models help you detect NSFW or unsafe content across images and text.
You can use these models for moderation, compliance, parental filters, user-generated content platforms, and any app that needs reliable safety checks.
This collection includes lightweight image detectors, text-based safety classifiers, and large-scale guardrail models built for enterprise workflows.
If you're interested in other types of text classification, including toxic content, check out our Classify Text collection.
Recommended Models
For image-only NSFW filtering, the falcons-ai/nsfw_image_detection model is very fast and lightweight. It is based on a Vision Transformer fine-tuned for binary "normal" vs "nsfw" classification and is designed for high-volume workloads.
If you want simple image filtering with very low latency, the m1guelpf/nsfw-filter model is also quick to run and easy to integrate.
The falcons-ai/nsfw_image_detection model gives strong accuracy at a relatively low cost, which is why it has so many runs on Replicate.
The m1guelpf/nsfw-filter model is cost-efficient and works well for simple filtering.
For enterprise-grade text or multimodal safety checks, the Llama Guard family provides high-quality moderation across many categories. Examples include meta/meta-llama-guard-2-8b, meta/llama-guard-3-8b, and meta/llama-guard-4-12b. These models cost more but offer deeper analysis.
For images, the best choices are the dedicated image filters:
They are trained specifically to classify explicit imagery and work well in pipelines where you only need to analyze images.
If you also need to analyze captions or text associated with images, consider meta/llama-guard-3-8b or the multimodal meta/llama-guard-4-12b.
The Llama Guard models are designed for this. For example:
Image-only filters usually return a simple label like "normal" or "nsfw" and sometimes a confidence score. The falcons-ai/nsfw_image_detection model works this way.
Text and multimodal guard models return a richer structure. For example meta/meta-llama-guard-2-8b returns whether the content is safe along with a list of safety categories if it is not.
Multimodal models like meta/llama-guard-4-12b evaluate both image and text at once and return combined safety classifications.
Many moderation and detection models are open source and can be self hosted using Cog or Docker.
If you want to publish your own model on Replicate, create a file named replicate.yaml that defines your model's inputs, outputs, and environment. Push the model to Replicate, and it will run automatically on managed GPUs.
Yes, as long as the license for each model allows commercial use. Always check the license on the model page.
For example, the m1guelpf/nsfw-filter model is open source, but you should still review the license.
Enterprise guard models like meta/llama-guard-4-12b have their own terms, so confirm that your intended use fits the policy.
You can automate moderation by setting up a pipeline that screens new uploads or user generated content. Use a lightweight image filter first, then run escalations through a stronger guard model. Store results, add manual review if needed, and integrate with your app's content logic. Models such as falcons-ai/nsfw_image_detection and meta/llama-guard-4-12b work well in automated systems.
Recommended Models


lucataco/flux-content-filter
Flux Content Filter - Check for public figures and copyright concerns
Updated 4 months, 1 week ago
148K runs


meta/llama-guard-4-12b
Updated 4 months, 3 weeks ago
19.2K runs


meta/llama-guard-3-8b
A Llama-3.1-8B pretrained model, fine-tuned for content safety classification
Updated 10 months, 3 weeks ago
356.9K runs


meta/meta-llama-guard-2-8b
A llama-3 based moderation and safeguarding language model
Updated 1 year, 6 months ago
734.9K runs


falcons-ai/nsfw_image_detection
Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification
Updated 1 year, 11 months ago
65.8M runs


m1guelpf/nsfw-filter
Run any image through the Stable Diffusion content filter
Updated 2 years, 11 months ago
10.3M runs