Detect NSFW content

These models help you detect NSFW or unsafe content across images and text.

You can use these models for moderation, compliance, parental filters, user-generated content platforms, and any app that needs reliable safety checks.

This collection includes lightweight image detectors, text-based safety classifiers, and large-scale guardrail models built for enterprise workflows.

If you're interested in other types of text classification, including toxic content, check out our Classify Text collection.

Recommended Models

Frequently asked questions

Which models are the fastest for NSFW detection?

For image-only NSFW filtering, the falcons-ai/nsfw_image_detection model is very fast and lightweight. It is based on a Vision Transformer fine-tuned for binary "normal" vs "nsfw" classification and is designed for high-volume workloads.
If you want simple image filtering with very low latency, the m1guelpf/nsfw-filter model is also quick to run and easy to integrate.

Which models offer the best balance of cost and quality?

The falcons-ai/nsfw_image_detection model gives strong accuracy at a relatively low cost, which is why it has so many runs on Replicate.
The m1guelpf/nsfw-filter model is cost-efficient and works well for simple filtering.
For enterprise-grade text or multimodal safety checks, the Llama Guard family provides high-quality moderation across many categories. Examples include meta/meta-llama-guard-2-8b, meta/llama-guard-3-8b, and meta/llama-guard-4-12b. These models cost more but offer deeper analysis.

What works best for image-only NSFW screening?

For images, the best choices are the dedicated image filters:

They are trained specifically to classify explicit imagery and work well in pipelines where you only need to analyze images.
If you also need to analyze captions or text associated with images, consider meta/llama-guard-3-8b or the multimodal meta/llama-guard-4-12b.

How about detecting unsafe text, prompts, or mixed content like image plus caption?

The Llama Guard models are designed for this. For example:

meta/meta-llama-guard-2-8b classifies text inputs across multiple safety categories.
meta/llama-guard-3-8b improves multilingual coverage and policy clarity.
meta/llama-guard-4-12b supports multimodal safety classification for image plus text.

What is the difference between the main model types in this collection?

Image-only NSFW filters: These classify raw images as safe or explicit. Examples include falcons-ai/nsfw_image_detection and m1guelpf/nsfw-filter.
Text or multimodal guardrail models: These analyze text or a combination of text and images and categorize safety issues. Examples include meta/meta-llama-guard-2-8b, meta/llama-guard-3-8b, and meta/llama-guard-4-12b.
Hybrid filters: Some models look at more than NSFW content. For example the lucataco/flux-content-filter model includes checks for public figures and copyright-sensitive content.

What kinds of outputs can I expect from these models?

Image-only filters usually return a simple label like "normal" or "nsfw" and sometimes a confidence score. The falcons-ai/nsfw_image_detection model works this way.
Text and multimodal guard models return a richer structure. For example meta/meta-llama-guard-2-8b returns whether the content is safe along with a list of safety categories if it is not.
Multimodal models like meta/llama-guard-4-12b evaluate both image and text at once and return combined safety classifications.

How can I self host or push a model to Replicate?

Many moderation and detection models are open source and can be self hosted using Cog or Docker.
If you want to publish your own model on Replicate, create a file named replicate.yaml that defines your model's inputs, outputs, and environment. Push the model to Replicate, and it will run automatically on managed GPUs.

Can I use these models for commercial work?

Yes, as long as the license for each model allows commercial use. Always check the license on the model page.
For example, the m1guelpf/nsfw-filter model is open source, but you should still review the license.
Enterprise guard models like meta/llama-guard-4-12b have their own terms, so confirm that your intended use fits the policy.

How do I run these models?

Pick a model that fits your use case for example falcons-ai/nsfw_image_detection for image NSFW detection or meta/llama-guard-3-8b for text safety analysis.
Upload your input. Images go to image detectors, and text or mixed content goes to guard models.
Adjust any optional parameters like thresholds or output settings.
Run the model and inspect the classification results.

What should I know before running a job in this collection?

Make sure your input format matches what the model expects.
Simple image filters are very fast. Larger guard models take more compute.
Text and multimodal models classify across many categories, so expect more complex outputs.
Moderation involves tradeoffs. You may need to adjust thresholds for false positives vs false negatives.
For compliance or enterprise use cases, keep logs and review flagged samples regularly.

Any other collection specific tips?

For large platforms, consider a two stage pipeline: run a fast image filter first, then route flagged items to a more powerful model like meta/llama-guard-3-8b.
If your workflow involves both images and accompanying text, use a multimodal model such as meta/llama-guard-4-12b.
If you need broader policy categories including things like public figure detection, test the lucataco/flux-content-filter model.
Always test with real examples from your platform to understand how each model behaves.

What if I want to automate moderation workflows in an app?

You can automate moderation by setting up a pipeline that screens new uploads or user generated content. Use a lightweight image filter first, then run escalations through a stronger guard model. Store results, add manual review if needed, and integrate with your app's content logic. Models such as falcons-ai/nsfw_image_detection and meta/llama-guard-4-12b work well in automated systems.