All docs
3 min read

AI moderation

On Pro+ plans, every submission runs through a language-model-based moderation classifier. The model reads the submission body and returns a single number between 0 and 1. Higher means more likely spam.

Unlike honeypot or captcha, AI moderation catches human spam — manually-typed junk that lands real keystrokes on a real keyboard but is still garbage. SEO link drops, copy-pasted bot pitches, harassment, and phishing attempts all score high.

Score range

Scores are floats from 0.0 to 1.0, rounded to two decimals.

Score Interpretation
0.00 – 0.30 Almost certainly legitimate
0.30 – 0.60 Ambiguous — review
0.60 – 0.85 Likely spam
0.85 – 1.00 Almost certainly spam

The score is also broken down by category — harassment, solicitation, phishing, nonsense — when applicable. The category lives next to the score on the submission detail page.

Threshold tuning

Each form has a moderation threshold (default 0.75). Submissions scoring at or above the threshold are filed in the spam folder; below it lands in the inbox.

Tune on the form's edit page under AI moderation:

  • Lower the threshold (e.g. 0.55) if you'd rather over-flag and hand-review.
  • Raise it (e.g. 0.90) if you only want the model to catch obvious garbage.
  • Set it to 1.0 to keep the score visible without acting on it.

The model itself isn't tuned per-form. The threshold is the only knob. If the score consistently misses a category of spam you care about, layer a custom rule on top.

Where it appears in the UI

On the submission detail page:

  • A coloured pill shows the score (green/yellow/red).
  • The category is shown next to it, if the model returned one.
  • If the submission was filed as spam because of the score, the spam reason will read ai_moderation:0.83 (or whatever the score was).

On the form overview, the average score over the last 30 days is shown next to the spam-rate chart so you can see drift.

Where it appears in the API

Every submission resource includes:

{
  "id": "subm_01H...",
  "status": "received",
  "ai_moderation": {
    "score": 0.12,
    "category": null,
    "model": "moderation-2025-11"
  },
  "payload": { ... }
}

The model field identifies the model version that scored this submission so you can compare scores across model upgrades.

Where it appears in MCP

The list-submissions and get-submission MCP tools both return the moderation score. The categorize-submissions prompt uses it to bucket submissions for batch labeling — see Categorization →.

Plan gating

Free, Starter, and Pro plans don't run the moderation classifier. The submission resource on those plans returns "ai_moderation": null. Upgrade to Pro+ in Billing to turn it on. There's no per-submission charge — moderation runs on every Pro+ submission as part of the plan.

What's next