lucataco/privacy-filter

OpenAI Privacy Filter is a bidirectional token classifier for detecting and masking personally identifiable information (PII) in text

Public
18 runs

Privacy Filter

Detect and redact personally identifiable information (PII) in text using OpenAI’s privacy-filter — a 1.5B-parameter (50M active) bidirectional token classifier.

The model returns both the detected entities (with confidence scores and character offsets) and a redacted version of the input where each span is replaced with its category tag, e.g. [private_email].

Supported PII categories

  • private_person — names, aliases
  • private_email — email addresses
  • private_phone — phone numbers
  • private_address — street addresses, cities, ZIPs
  • private_url — personal URLs and handles
  • private_date — birthdays and sensitive dates
  • account_number — bank, card, and account IDs
  • secret — API keys, passwords, tokens

Inputs

  • text — the text you want to scan for PII.
  • score_threshold — minimum confidence (0.0–1.0) for a span to be flagged. Default is 0.5.
  • 0.3 — compliance / audit mode: catch everything for human review
  • 0.5 — balanced default
  • 0.9 — automated redaction pipeline: only act when very sure

Output

The model returns two things:

  • entities — a list of detected PII spans, each with its category, confidence score, the matched text, and its start/end character positions in the input.
  • redacted — the original text with every detected span replaced by its category in square brackets (e.g. [private_email]).

Adjacent spans of the same category are merged so multi-word PII like full names appear as a single entity.

Use cases

  • Logging & observability — strip PII from request logs before they hit your storage
  • Customer support pipelines — redact tickets before training or analytics
  • LLM input/output guardrails — sanitize prompts and completions in real time
  • Dataset preparation — clean user-generated content prior to model training
  • Compliance workflows — surface candidate PII for human reviewers (use a low threshold)

Tips

  • For automated redaction (no human in the loop), use a higher threshold like 0.9 to minimize false positives.
  • For audit / review pipelines, use a lower threshold like 0.3 to maximize recall — a human can dismiss false positives later.
  • The model is multilingual-friendly thanks to its bidirectional tokenizer, but works best on English-style PII patterns.

Limitations

  • Token-level classifier — does not perform entity linking or de-duplication across documents.
  • Confidence scores are calibrated, but rare PII formats (e.g. unusual international phone numbers, non-Latin names) may need a lower threshold.
  • Does not detect contextual PII (e.g. “the patient” in a medical note) — only surface-form spans.
Model created
Model updated