Skip to main content
RuyaTech
Payments
Reliability
Security

Stripe webhook retries and the idempotency hole nobody tests

Oussama IbrahimFounder & Lead Engineer4 min read

Key takeaway: Stripe retries webhooks whenever your endpoint times out or returns a non-2xx response. If your handler isn't idempotent on the event id, the same payment gets processed twice and your numbers drift. The fix is small: log every event id, treat repeats as a no-op, and reconcile against the gateway.

A payment that succeeded once and got recorded twice is one of the quieter ways a young SaaS loses trust in its own numbers.

It almost never shows up in testing. The founder buys their own product, the charge works, the row lands in the database, everyone moves on. The bug lives in an event nobody triggered by hand.

Why Stripe sends the same event twice

Stripe retries any webhook your endpoint didn't acknowledge with a 2xx response quickly enough, and from its side a slow handler and a failed handler look the same.

Stripe's delivery contract is simple. You have a short window to return a 2xx. If you don't, the event goes back on the queue and gets redelivered, with backoff, for up to three days.

From Stripe's side there's no difference between a handler that crashed, a handler that's mid-deploy, and a handler that's just slow because the database is under load. All three look like a failed delivery. All three get retried.

Which means the same checkout.session.completed or invoice.paid event can land on your server two, three, sometimes more times. The first delivery might have actually succeeded; your response just didn't come back in time.

What goes wrong when the handler isn't idempotent

Without an idempotency check, every retry runs the side effects again: a second order row, a second fulfilment trigger, a second internal credit.

A typical naive handler does roughly this on a successful charge:

  1. Parse the event.
  2. Insert a payment row.
  3. Update the user's balance or entitlement.
  4. Trigger fulfilment, an email, a downstream webhook.
  5. Return 200.

If step 2 or 3 is slow and the response misses the window, Stripe retries. Now all of that runs again. The user gets two order rows, two fulfilment triggers, two internal credits. The charge itself was fine. Your accounting is now wrong, and the drift only shows up later, at reconciliation, when someone asks why the gateway and your database disagree.

The worst version is silent: nobody complains because the customer was charged correctly. You just slowly stop being able to trust your own dashboards.

The fix is small

Log the gateway's event id the first time you see it, and make every subsequent delivery of that id a no-op before any side effect runs.

The handler needs one thing at the top: a check on the event id Stripe sent.

  1. Verify the signature. Reject anything unsigned. This is separate from idempotency, but it belongs in the same line of defence.
  2. Look up the event id. If you've seen it, return 200 and stop. No inserts, no balance changes, no emails.
  3. Insert the event id and do the work in the same transaction. So a crash mid-handler doesn't leave you having recorded the event but skipped the side effect, or done the side effect but not recorded the event.
  4. Add a unique constraint on the event id column as a backstop, in case two retries race each other into the handler at the same time.

That's it. The id is the gateway's own, not one you mint. Stripe sends the same evt_... on every retry of the same event, which is what makes this work.

What we look for in a review

Reading payment code for reliability means looking past the successful charge at the events nobody tests by hand.

When we read a payment integration as part of an audit, the charge path is rarely the problem. The gaps live in the cases the demo never ran:

  • Webhook signature verification skipped, or done with the wrong secret in production.
  • No idempotency at all, or idempotency keyed on something the client controls instead of the gateway event id.
  • Side effects that fan out before the event row is committed, so a crash leaves you in a half-state.
  • No reconciliation job comparing what the gateway says happened with what your database says happened.
  • A retried refund or a partial capture that the handler treats like a fresh charge.

None of these are exotic. They're the boring cases. They're also the ones that quietly cost real money once volume picks up. Most of them are the same failure modes behind designing a bulk payment system that survives volume.

The takeaway

If you're running payments and you've never deliberately triggered a duplicate webhook against your own handler, that's the cheapest test you can run this week. Send the same event twice from the Stripe dashboard and watch what your database does.

If it does it twice, you've found the bug before reconciliation does.

If you want someone else to read the payment path with that lens, and the rest of the production-readiness surface with it, that's what an audit is for.

Frequently Asked Questions

Why does Stripe send the same webhook more than once?

Stripe retries any event your endpoint didn't acknowledge with a 2xx response in time. A slow database, a deploy in progress, or a transient network blip all look identical to Stripe as a failed delivery, so it sends the event again.

What does idempotent actually mean for a webhook handler?

It means processing the same event twice produces the same result as processing it once. In practice that's storing the gateway's event id the first time you see it and making every subsequent delivery of that id a no-op.

Isn't a database unique constraint enough?

It's a good backstop, but on its own it surfaces as a failed insert under load and depends on every write path going through that table. Check the event id first, in the handler, and let the constraint catch the race you didn't see.

Related Services

Need help with what you just read? These services are directly relevant.

Let's Talk

Ready to Build, Rescue, or Scale Your Product?

Tell us about your project. If it's a good fit, we'll schedule a strategy session.

Let's Talk

We respond within 4 hours during business hours. No obligation.