All posts

Webhook Reliability at Scale

Patterns for exactly-once processing, dead-letter queues, and retry back-off in payment webhook pipelines.

3 min read
WebhooksDistributed SystemsRedisReliability

Share this post

LinkedIn

Substack button copies a ready-to-paste draft snippet and opens the editor.

Why Webhooks Break

Webhooks are the backbone of modern payment integrations. Paystack, Stripe, and Flutterwave all use them to notify your system of payment events. But webhooks are inherently unreliable:

  • Network failures — The POST never reaches your server
  • Timeout — Your server takes too long to respond, so the provider retries
  • Out-of-order delivery — Event B arrives before event A
  • Duplicate delivery — The provider's retry logic sends the same event multiple times

Your system must handle all of these gracefully.

Pattern 1: ACK First, Process Later

The single most impactful pattern. Never do heavy processing in the webhook handler itself:

@Post('webhook')
async handle(@Body() payload: WebhookPayload, @Res() res: Response) {
  // Verify signature (fast)
  if (!this.verifySignature(payload)) {
    return res.status(401).send();
  }

  // ACK immediately
  res.status(200).send('OK');

  // Enqueue for async processing
  await this.queue.add('process-webhook', payload, {
    attempts: 3,
    backoff: { type: 'exponential', delay: 1000 },
  });
}

This prevents timeouts and retries from the payment provider.

Pattern 2: Exponential Backoff with Jitter

When your processor fails, retry with increasing delays plus random jitter to prevent thundering herd:

const delay = Math.min(
  baseDelay * Math.pow(2, attempt) + Math.random() * jitter,
  maxDelay
);

Pattern 3: Dead Letter Queue with Alerting

After exhausting retries, events must go somewhere visible — not silently disappear:

  • Store the full event payload, error message, and stack trace
  • Alert the on-call engineer via Slack/PagerDuty
  • Provide a retry button in the admin dashboard
  • Set a TTL for auto-cleanup of old DLQ entries (30 days)

Pattern 4: Event Ordering with Sequence Numbers

For stateful flows (e.g., payment.pendingpayment.success), use provider-supplied timestamps or sequence numbers:

async processEvent(event: NormalizedEvent) {
  const existing = await this.db.findByRef(event.providerRef);
  if (existing && existing.timestamp >= event.timestamp) {
    // Out-of-order: skip this older event
    return;
  }
  await this.db.upsert(event);
}

Pattern 5: Signature Verification

Always verify webhook signatures before any processing. Each provider has its own scheme:

| Provider | Method | Header | |----------|--------|--------| | Stripe | HMAC-SHA256 | Stripe-Signature | | Paystack | HMAC-SHA512 | x-paystack-signature | | Flutterwave | Secret hash comparison | verif-hash |

Warning

Never skip signature verification, even in staging. It's the only thing preventing an attacker from crediting arbitrary wallets.

Putting It All Together

A production webhook pipeline combines all five patterns:

  1. Verify signature → 2. ACK 200 → 3. Enqueue → 4. Idempotency check → 5. Process with ordering → 6. Retry with backoff → 7. DLQ on exhaustion

Each layer addresses a specific failure mode. Remove any one and you have a gap.

Monitoring Checklist

  • Webhook receipt rate — Are you receiving the expected volume?
  • Processing latency p99 — Are events processing within SLA?
  • DLQ depth — Is the dead letter queue growing?
  • Duplicate rate — How often is the idempotency layer catching retries?
  • Error rate by provider — Is one provider more problematic than others?

Build dashboards for these five metrics and you'll catch issues before they impact users.

Enjoyed this post?

Get a new backend engineering deep-dive every week — payment systems, distributed architecture, core banking.

No spam. Unsubscribe anytime.