Webhooks Are Harder Than They Seem

Eikon · on Dec 3, 2024

Webhooks are a classic example of premature optimization masquerading as "best practice". The irony in this blog post is palpable - they advocate for webhooks while simultaneously describing their fundamental weakness. A webhook system requires you to implement the exact polling mechanism you were trying to avoid in the first place, just with added layers of complexity.

The recommendation for "exponential backoff spanning several days" is particularly telling. What happens when your service faces extended downtime beyond their arbitrary retry window? Your "real-time" system suddenly develops permanent blind spots. Meanwhile, a simple polling implementation would seamlessly catch up by simply processing the backlog.

For mission-critical integrations like payment processing, I've found that polling Stripe's events API at regular intervals is not just more reliable - it's significantly easier to reason about and debug. When issues arise, you can simply reset your local state and replay events, rather than attempting to recreate the exact conditions that triggered a webhook failure.

The purported benefit? Saving a few seconds of latency in a business process where users are already accustomed to asynchronous workflows. It's architectural complexity theater that solves a non-problem while introducing very real failure modes.

moribvndvs · on Dec 4, 2024

Well, in addition to near-real time, polling sucks at sparse workloads. Waking up to do work can be pretty wasteful with spiky or infrequently used but still important actions, and are arguably easier to set up, despite the hemming and hawing of this article, across system boundaries. But I do agree that webhooks are not the thing I really want to reach for with mission critical, high load integrations unless I feel like torturing my future self.

themanmaran · on Dec 3, 2024

Having built a webhook system for our product, I'd caveat that webhooks are "harder than they seem, but not that hard".

The big components we ran into were:

1. Having retry logic if you don't get a 200 back. And some way of flagging the failed requests. We already had a pretty robust job queue/worker setup on our side, so it was a lot easier to add webhooks with that logic in place.

2. Keeping a webhook log. On our side we had a request history already that kept a log of our LLM requests, so it was pretty easy to reuse.

3. Auth / dev experience. Some way for a developer to put in a url + headers + auth method and click "test".

marcosdumay · on Dec 3, 2024

Web-hooks create distributed systems. So they are "distributed systems" hard.

What means that, if you know your timing, miltiplicity, and reliability requirements and guarantees, you can solve most problems by taking the best design pieces from the state of the art (idempotency, write-only logic, retrials, optimist atomicity, peer election...) and applying whatever you can to simplify your code as most as possible.

If you know how, you can probably solve most problems just fine. But you must know how, and you must know the problem.

djbusby · on Dec 3, 2024

We had this problem at work too. And didn't want yet another service provider. Built our own, in Go, for incoming and outgoing. Has some routing features too. Client Webhooks configs stored in that service and our main app API to that thing to configure.

Not optimized for scale, likely only a few thousand per second capable.

yjftsjthsd-h · on Dec 3, 2024

> 1. Having retry logic if you don't get a 200 back. And some way of flagging the failed requests.

With exponential/random backoff?

com2kid · on Dec 3, 2024

The difficulties around webhooks, and pub/sub in general on the web, is a great example of how decentralized architectures can make things much harder.

I wonder about an alternate timeline where someone had cornered the market on pub/sub as a service, a centralized pub/sub system that everyone goes through, kind of like how Google and Apple have a duopoly on push notifications.

Imagine this:

1. You are sending out emails through an email service provider (Mailchimp, Braze, etc) and you want to subscribe to an event stream of success/failures/errors/etc

2. You register with the email provider and they hand you back a token for SassyPubSub

3. Your backend service registers its token with SassyPubSub and lets it know exactly how you want to be notified - posts to a URL, over a persistent websocket connection, pushes to SQS/RabbitMQ, etc

4. The email provider pushes messages to SassyPubSub

5. You consume those messages however you want.

The flow should be dirt simple, just a few lines of code to setup. To be effective, it'd have to have industry wide standard adoption, but it would simplify so many things.

apitman · on Dec 3, 2024

This is another good one: https://blog.sequinstream.com/events-not-webhooks/

humanfromearth · on Dec 3, 2024

How does Svix compare to https://hookdeck.com/ ? Is it similar?

Eikon · on Dec 3, 2024

Lol https://news.ycombinator.com/item?id=27039628