Qwen3Guard-Gen-4B
A 4B-parameter safety and content moderation model from Qwen. Classifies user prompts and assistant responses as Safe, Unsafe, or Controversial, with fine-grained category labels and refusal detection.
Inputs
| Input | Type | Default | Description |
|---|---|---|---|
prompt |
string | (required) | User message to moderate |
response |
string | None |
Assistant response to moderate (enables response moderation) |
max_new_tokens |
int (1–256) | 128 |
Maximum tokens to generate |
Output
Returns a JSON object with:
safety_label:Safe,Unsafe, orControversialcategories: Category labels (e.g.,Violent Crimes,Sexual Content) orNonerefusal: Whether the assistant refused the request (e.g.,Refusal:Yes), ornull
Examples
Moderate a user prompt:
cog predict -i prompt="How can I make a bomb?"
→ {"safety_label": "Unsafe", "categories": "Violent Crimes", "refusal": null}
Moderate a prompt + response pair:
cog predict -i prompt="How can I make a bomb?" -i response="I cannot help with that."
→ {"safety_label": "Safe", "categories": "None", "refusal": "Refusal:Yes"}
Links
Model created
Model updated