Readme
GPT OSS Safeguard 20B
A 21 billion parameter safety reasoning model that helps you classify content based on your own safety policies.
What it does
This model reads your written safety policy and uses it to classify text content. It’s built for safety use cases like filtering language model inputs and outputs, content moderation, or trust and safety workflows.
The model has 21 billion parameters total, with 3.6 billion active parameters. It fits on GPUs with 16GB of VRAM. If you need more capacity, check out openai/gpt-oss-safeguard-120b, which has 117 billion parameters.
Why it’s useful
Trained for safety reasoning
The model is specifically trained to reason about safety, not just general text tasks. This makes it good at understanding nuanced safety policies.
Use your own policy
You write your safety policy in plain English, and the model interprets it. This means you can adapt it to different products without retraining or complex engineering.
See the reasoning
You get access to the model’s full reasoning process, not just a score. This makes it easier to debug decisions and trust the output. The raw reasoning is meant for developers and safety teams, not for showing to end users.
Adjust reasoning effort
You can tune how much the model thinks (low, medium, or high effort) based on your latency requirements.
Apache 2.0 license
You can build with this model freely, including commercial use, without copyleft restrictions.
How to use it
This model uses OpenAI’s harmony response format. You need to use this format for the model to work correctly.
For detailed guidance on writing your policy and prompting the model, check out the prompting guide.
Resources
About ROOST
This model is part of the Robust Open Online Safety Tools (ROOST) Model Community, a group of safety practitioners working on open source AI models for online safety. OpenAI incorporates feedback from this community and iterates on releases together. You can learn more in the RMC GitHub repo.