Retries
If your endpoint doesn't return 2xx in time, we retry. We won't give up too fast and we won't hammer you forever.
Schedule
7 attempts total — the initial send plus 6 retries — over roughly 24 hours.
| Attempt | Delay from previous |
|---|---|
| 1 | — (initial dispatch) |
| 2 | 30 seconds |
| 3 | 2 minutes |
| 4 | 10 minutes |
| 5 | 1 hour |
| 6 | 6 hours |
| 7 | 16 hours |
Each attempt is jittered ±10% so we don't thunder downstream services that are already struggling.
After attempt 7, we mark the delivery as failed and stop. Failed deliveries stay in the delivery log and you can replay any of them manually.
What counts as failure
Anything that isn't a clean 2xx within 30 seconds:
- HTTP status outside
200–299 - Connection timeout
- TCP reset / DNS failure
- TLS handshake failure
- Response not received within 30 seconds of connection
3xx redirects are followed up to 3 hops. 4xx is treated as failure and retried — we don't trust that a 4xx today will still be one in 6 hours, especially if it's caused by deploy lag.
What does NOT trigger a retry
- A 2xx response. We're done.
- Webhook is inactive at retry time. We drop the queued attempt.
- The webhook was deleted between attempts. Same — dropped.
Idempotency
We retry. That means your endpoint will sometimes see the same submission_id twice. Make the handler safe to re-run:
// pseudo
const key = body.submission_id;
if (await alreadyProcessed(key)) {
return new Response('ok', { status: 200 });
}
await process(body);
await markProcessed(key);
submission_id is a ULID — unique, monotonic, safe to use as a primary key in your dedupe table. Don't dedupe on the request URL or timestamp; both can vary across retries.
Replay vs retry
Retry is automatic, driven by failure, capped at 7 attempts.
Replay is manual, triggered by you, with no cap. A replay is a fresh attempt that goes through the same delivery pipeline and resets its own retry counter. Use replays after you've fixed a downstream bug and want to backfill failed events. See replay.
Tuning expectations
If your endpoint is flaky, the practical effect is that some events arrive minutes or hours late. Plan accordingly:
- Don't gate user-visible flows on webhook delivery. Use the dashboard or the API for live data.
- Treat webhooks as eventually-consistent fan-out, not as your primary read path.
- Monitor the delivery log for
success_rateper webhook. A healthy webhook is well above 99%.