Webhooks do’s and dont’s: what we learned after integrating APIs

bazizbaziz · on Nov 22, 2016

How do people in production handle the possibility that your service might miss a webhook notification? If you miss a notification you'll end up with stale data and you won't know it.

Slack has a retry policy for a while but will then just give up. Another webhook provider I've looked at says nothing at all about this sort of thing. How do folks deal with this in production systems?

Seems to me like the best way to address this issue is to use the webhook as a hint that you need to run some other process that guarantees you've got all updates.

johns · on Nov 22, 2016

When I was at IFTTT (a few years ago, so it's definitely changed since then) we tried not to rely on the content of the webhooks and just used them as a hint as you describe to fetch new data. Not every API made this easy though.

If receiving a webhook is critical, you should make your receiver do as little as possible to place the event into a resilient queueing system and then process them separately. That won't save you from bad DNS, TLS, etc. configs but it should help reduce the possibility that you DoS yourself with a flood of webhook events.

Also (shameless plug), you could monitor and log them (we offer retries if your server fails): https://www.runscope.com/product/alerts

developer2 · on Nov 23, 2016

I would prefer to implement the sending of webhooks in bulk - if the consumer falls behind, they receive up to 100-1000 webhooks per request (depending on the size and complexity of each individual webhook - ids only is 1000, complex documents 100). This drastically cuts down on the number of concurrent requests to a single client when load is high, or the consumer broke down for a period of time.

Unfortunately, developers writing code to receive batch requests are often... inadequate, to say the least. They'll write basic looping code without any error/exception handling; so if the 3rd item in a bulk request of 100 items causes a server-side error for them, they throw a 500 Internal Server Error or similar and fail to continue processing items 4 through 100. You simply cannot batch webhooks as a producer, unless you detect a single failure from the client to process a batch as a cue to drop to performing "batches" of size 1 until you receive an error for a single request, at which point you return to bulk. Rinse and repeat.

Honestly, being the producer sending webhooks to consumers which are written by random developers is a nightmare. You have to understand that your customers will not write proper code to accept your webhook requests, even if each request is for a single webhook. You also must understand that your customers will not look to blame themselves for shitty code. You can retry 1,000 times over a 48 hour period, and if their code still fails to process the webhook, it will be YOUR fault, not theirs. Truthfully, it is horrible to be on the sending end of webhooks to random developers/customers.

_pmf_ · on Nov 23, 2016

Transactions are obviously too enterprisey for fast moving unicorns; better spend 3 weeks to badly hack together a ridiculous farce.

notgood · on Nov 23, 2016

I don't understand, if it's such a nightmare why don't you (the producer) create the code/libraries to use those webhooks? At least in the 2 most common platforms (e.g. PHP, Java)

madamelic · on Nov 22, 2016

Stripe has a retry policy as well.

You can set up something where it will alert you if there are too many failures in a certain time period. That isn't offered by Stripe but you can build it.

If you mean in the case of "catastrophic failure", there is none.

If there is a "catastrophic failure" (machine gets shut off for a week, data center blown up, whatever), there are probably bigger issues or we probably would already know.

brandur · on Nov 22, 2016

Stripe has an "events" API that can be polled to receive the same content that you would have received via Webhook [1].

(Disclaimer: I work there.)

If you missed some Webhooks due to an application failure, it's possible to page through it and look for omissions. I've spoken to at least one person integrating who had this sort of setup running as a regular process to protect against the possibility of dropped Webhooks. This usually works pretty well, but does start to break down at very large scale where events are being created faster than you can page back.

The possibility of dropped events is a major disavantage of Webhooks in my mind -- if you consider other alternatives for streaming APIs like a Kafka/Kinesis-like stream (over HTTP) that's simply iterated through periodically with a cursor, you avoid this sort of degenerate case completely, and also get nice things like a vastly reduced number of total HTTP requests, and guaranteed event ordering.

(But to be clear, Webhooks are overall pretty good.)

[1] https://stripe.com/docs/api#events

madamelic · on Nov 22, 2016

Oh gosh, that is super neat! :)

I never even thought of using it that way. I just use events to check that it is a valid Stripe event (Probably easier / better to set up the ELB to only listen to certain addresses)

Rapzid · on Nov 23, 2016

Some further related reading; Fowler talks polling for events in some of his Enterprise Integration stuff http://www.martinfowler.com/articles/enterpriseREST.html

EDIT: Not Fowler, but his site lol.

sanjeevkm · on Nov 23, 2016

We[1] had a similar problem with clients reporting to us about lost callbacks[2] (our term for webhook). To solve it, we have built two options.

- Get a notification email everytime the callback fails. The email contains the same information the callback was supposed to deliver

- Retries. We retry for the next 24 hrs (max) with an interval of 5 mins or until the callback call succeeds (within those 24hrs). We created a sub-resource called `calls` (/callbacks/[id]/calls) that keep the status of the call we made. If it succeeds, the status changes to "SUCCESS", if it fails, it remains in "FAILED". If even after 24hrs the receiver system being down, and the call does not succeed, the developer can make a call to GET /callbacks/[id]/calls?status=FAILURE and receive all the failed calls. They can process the content and do a PUT /callbacks/[id]/calls?id=ID1&id=ID2&id=ID3... with body as `{ "status": "SUCCESS" }` to mark them as "SUCCESS".

The calls are saved for upto 7 days, so that the dev has enough time to fix their server issues, and get back all the lost callback calls. This solved much of the client issues.

* An added benefit of this came to the devs who could not get an inbound POST from us into their network due to firewall restrictions. The firewall restriction defeated the purpose of live callbacks, but with the `status` option, they only checked for new (`FAILED`) notifications once every 2 hrs or so , and mark the one processed with `SUCCESS`. This way, they only look for `FAILED` and process when they have one. Else, nothing to do.

[1] Whispir - https://www.whispir.com/ [2] https://whispir.github.io/api/#handling-callback-failures

blairanderson · on Nov 23, 2016

I have recently moved all received webhooks to a job queue and have been very happy. you can retry the processing on your own terms.

akamaozu · on Nov 24, 2016

This.

Previous devs were doing expensive things whenever we received webhooks. This meant we DoS'd ourselves every time a sizable amount of webhooks came our way.

Set up a tiny server on Heroku that received the webhooks and put them on a queue. A worker with a configurable concurrency level later forwards the events on the queue .

Dropped from four digit 502s and 504s weekly to virtually none.

stevekemp · on Nov 23, 2016

Agreed.

It also allows you to do testing by injecting pre-cooked payloads into your queue system.

iddqd · on Nov 22, 2016

Maybe webhook providers could provide an endpoint where one could poll for events that failed to deliver.

developer2 · on Nov 23, 2016

The good APIs do, but it's still at a loss to both sides.

a) The producer of the events has to store them in semi-permanent storage. I've been there and done that - failed webhooks result in a table of tens of millions of rows, even if the memory on each event is only 48 hours. It's astounding how many events fail to process. And I've been through extensive verification that there is truly no problem on our side - it's always the client who is wrong. Emails back and forth for weeks with the client screaming "it's your fault!" - only to finally receive an "oops, we found the problem on our end... sorry".

b) Frankly, if the consumer of the events fails on a single webhook more than 5 times in a 24 hour period, that event is a permanent loss. The reason it fails consistently is because that specific event is a permanent failure to process on the consumer's side. It is probably throwing a 500 Internal Server Error or similar - every single time. 0.001% of webhook consumers actually have emergency alerts when webhooks fail on their end, so the job will continue to throw a silent/unlogged/unnoticed/ignored error no matter how many times you retry. These are the same type of developers who will never poll your "failure queue", because they don't even understand that their consumer endpoint throws 500 Internal Server Errors on 10% of your requests. You're trying to provide a service to developers that live in a fantasy world where errors and exceptions never happen on their end.

It's a simple fact that developers who consume webhook requests are a disgrace. Chances are that if a request fails two times, it will never succeed. And yet the best APIs will try hundreds/thousands of times over a 24 hour period - simply to prove to that client that it is their fault that they are not processing webhooks properly. There is only so much a webhook producer can do. There is no magic we can do if the consumer is copy/pasting PHP snippets from Google or Stackoverflow.

Story time. The most memorable situation I can remember is a client who was experiencing 100% webhook consumer failure for more than three weeks. The emails from their team - and subsequent phone calls from their CTO - were absolutely stunning; it got to the point that we were hounding our own business people to drop them as a client, the verbal abuse was that bad. Turns out they had a bunch of PHP developers who were for the first time writing their consumer webhook endpoint in C for some reason. They were trying to parse the custom "id" field that they sent us as a string in a JSON field, as an integer. It was all because they sent us a string, and choked on trying to re-interpret it as an integer. It hurts to even think about that case.

tldr; Fuck webhook consumers. Incompetent developers who don't know how to handle errors that are 100% their fault.

Funny aside: the most amusing cases come from PHP and .NET developers who expose their internal server errors in production. When you can copy/paste the response they gave you on a webhook because they are calling an undefined function or method... pure bliss.

dajonker · on Nov 23, 2016

You could also help customers who apparently have trouble properly connecting to your APIs by giving better error returns (got type A, expected type B), providing client libraries or giving more extensive support (for a price). Blaming the customer is easy, providing a way for even those "incompetent developers" to interface with you in a way that is easy to understand and debug for all parties is hard.

Moru · on Nov 23, 2016

The truly great developers find a better way than only retrying webhooks and prepare a client library that the customer can just plug in to their code :-)

draaglom · on Nov 22, 2016

I like what Shopify does here - because your app is tied to a partner account, they can email you saying "this payload has failed 20 times in succession". If it fails too many times then the webhook is uninstalled.

Not to be snarky - but it's a distributed system. There's no way to guarantee you've got all updates! At a certain combination of latency and volume polling becomes impossible so webhooks (or something analogous) are all you've got :)

icebraining · on Nov 22, 2016

At a certain combination of latency and volume polling becomes impossible so webhooks (or something analogous) are all you've got :)

Isn't it the opposite? At a certain volume, when each polling request aways returns results, polling becomes more efficient than "interrupts". It's only at low volumes that webhooks are more efficient, since polling would have to issue a lot of requests with no response if a low latency is required.

draaglom · on Nov 23, 2016

Assuming here you mean something like a classic REST-alike "/events" endpoint which returns a bunch of stuff that's changed since the last time you requested it.

In that case, as the number of events grows, the HTTP transaction overhead goes to zero with polling, yeah.

But now you have a bunch of extra things which will impact your latency:

- The third-party service will do more work preparing the payload, meaning that the earliest event on the list no longer hits the wire right away

- related: someone might be holding a lock on event 63 of 100. Now other events have to wait for it before they can hit the wire

- In your application code, you may have to read the entire request before you can validate it or do anything with it (at least, this goes for APIs which speak JSON)

- You probably have to commit your transaction for the previous page of events before you can start your next request. Otherwise, whichever side of the network is keeping tabs on your current pointer in the list, that pointer may end up in the wrong place. Oops!

- If more events happen during the time it takes you to request a page than will fit on a page, then you're really stuck.

- An error anywhere in the super-http-transaction (network, user code...) now means that an entire page of updates has been delayed rather than just one.

It's possible to remove the sequential-ness constraint from our hypothetical "/events" but not without introducing other fun new problems.

niftich · on Nov 22, 2016

By periodic reconciliation of the full dataset.

rakoo · on Nov 23, 2016

Yeah, I feel the best way is just for providers to give a RSS feed as the primary way of listing events and then notify with PubSubHubbub directly. Big advantage: everything already exists and is standard.

z3t4 · on Nov 23, 2016

The easiest implementation would be a serial number. Then the client can check for holes in the number series.

shizcakes · on Nov 22, 2016

I think the "securing webhooks" section is missing some critical tips that we've learned in production.

1) Resolve the DNS of the webhook URL, and compare all returned addresses from that resolution against an IP blacklist, which includes all RFC1918 addresses, EC2 instance metadata, and any other concerning addresses.

2) Even though it seems like you'd want to, do NOT blindly return an unexpected response to the person configuring the webhook. Say there was an error, what the code was, etc, but returning the response body means you basically just gave someone curl with a starting point on your network (see 1 as well)

3) Find ways to perform other validations of those webhooks. Are the URLs garbage? Are they against someone else's system? Create validation workflows that require initial pushes to the URL with a validation token to be entered back into your system, like validating an email address by clicking a link.

davidgh · on Nov 23, 2016

We wrestled with #1 (and therefore #2) for a long time. Amazing how carful you have to be. EC2 meta data is a place where a lot of services have their pants down unknowingly.

Our eventual solution? AWS Lambda. We built a simple function that receives a payload with the HTTP request to be made and the Lambda function makes the request. It serves as a sandboxed micro-proxy for all of our untrusted external HTTP calls. We give that Lambda function permission to do nothing within the AWS account. We even went to far as to place the Lambda in a dedicated AWS account to further isolate it, which prevents an admin accidentally placing the Lambda within a sensitive VPC, for example.

We still examine endpoint URLs to insure they don't belong to the internal network, but I sleep much better knowing that if something slips through the Lambda function is isolated from our internal resources and there's not too much interesting to see / probe / find.

nodesocket · on Nov 22, 2016

Point 1, can be a little more tricky than it seems. At first you'll think, I'll just use a regex to match known local addresses to protect again evil callback urls like http://127.0.0.1/status.

You'll realize though you have to actually resolve hostnames, because users can just create an A record of foo.bar.com that points to 127.0.0.1.

giuliano84 · on Nov 22, 2016

Point 3 is spot on. I think it would be a good strategy to avoid expire dates on subscriptions. Producers could take decisions on whether or not keep sending data by monitoring responses on the consumer's target URL.

Another thing that it should be worth mentioning is that some services batch notifications (e.g Facebook Messenger) so that they can send more data in a single POST request.

pdkl95 · on Nov 22, 2016

> IP blacklist

Please stop trying to enumerate badness. This is and always will be incomplete.

http://www.ranum.com/security/computer_security/editorials/d...

pfranz · on Nov 22, 2016

Ehh. I disagree with both Default Permit and Enumerating Badness--I think they have their place. If I run a club do I background check and whitelist every customer? Or to a blacklist the troublemakers? The problems cited in the article were reasonable decisions at the time, but years later grew into headaches when the use-cases changed.

Does their no Default Permit policy apply to network egress? Do I have to approve each and every application that wants to connect to the Internet? I think the leaving port 80 open because it was whitelisted is why so many things tunnel through port 80 instead of using other protocols and ports. Now how do you filter and whitelist traffic?

His example of antivirus products using Enumerating Badness is a market failing more than anything else. I'm not sure I see the alternative for a naive user. Call a specialist to investigate their use-cases and "open the system" to accommodate? Any time you want to update your tool or workflow or try something new have that specialist come out and reevaluate your system?

shizcakes · on Nov 22, 2016

I understand what you're saying here. But the baseline sanity set is pretty fixed. Localhost, RFC1918, IPv6 link local, etc. I'm not advocating folks blacklist every bad actor on the internet - that obviously cannot work - but there's some simple things you can do to prevent a malicious user from configuring webhooks that attack your internal services.

buzer · on Nov 23, 2016

There are cases where IP blacklists are pretty much the only option you have. For example, in the case of webhooks, what would you whitelist? You cannot whitelist anything that user provides without manual approval (which can be huge overhead).

Pretty much the only alternative I can think of is to query whois databases of RIRs, but you would need blacklisting there as well since they do include private IP spaces as well (ex. you would need to blacklist netname IETF-RESERVED-ADDRESS-BLOCK).

Similar problem exists with route advertisements from transit providers. They are not going to provide you a list of routes they advertise to you (since they don't get those from their customers usually), so your only option is to blacklist bogons yourself (unless you want to manually approve every single prefix out there as needed).

m_mueller · on Nov 23, 2016

Yes, whitelisting makes much more sense. Github has an API that you can ask about which IPs are in their network - compare the webhook sender against that list and you're dandy. This should become a standard in webhook APIs.

detaro · on Nov 22, 2016

The entire guide seems mostly consumer-centric: What do they as the one being called want, sender-side concerns are missing entirely.

giuliano84 · on Nov 22, 2016

Yes but that's why you build webhook for, to let people consume your content. I think DX is critical today and big companies can afford to do the heavy-lifting. As I mentioned in the Subscription Expiration paragraph I totally get Microsoft's reason to put a 72hrs expire date on subscriptions but it adds some friction on the consumer side.

qyv · on Nov 22, 2016

Consumer convenience cannot be made at the expense of security measures and abuse mitigation, see: IoT.

sly010 · on Nov 22, 2016

Aside: Webhooks are always a pain.

Implementing polling is easier for both sides.

I routinely have to integrate with random 3rd party systems, some with no or broken webhooks, some with no API at all. It turns out for my customers (this may not be always the case) eventual consistency is more important than timelyness.

What I do now every time I need to sync data from a third party is I always implement some sort of pull first with idempotent logic on my side. It's easier, and it allows me to just re-run things if something fails (e.g. network error, unexpected data in production, etc).

Only when that works reliably and only if required by the customer I implement a webhook, but I usually throw away most of the message and just wake up my polling worker that is otherwise polling relatively slowly.

beachy · on Nov 22, 2016

Long polling works brilliantly (where your API call blocks until there are some results or until timeout occurs - then you loop and call again).

Long polling gives you the best of both worlds - easy programming model with instant alerting rather than the delay of normal polling.

The only downside really is the need for a more or less permanently open connection per client. As long as the server does not use a naive "thread per connection" model this can scale up to many hundreds of thousands of clients or more.

Animats · on Nov 22, 2016

The good thing about long polling is that if the connection breaks, the keep-alive will time out and you'll know you're not getting updates. Assuming there's some keep-alive feature.

MrBuddyCasino · on Nov 23, 2016

Thats what I did, too. Poll, fetch and remember and retry errors, and if possible implement a sliding window as a poor man's cursor using dateFrom / dateTo if available.

jarcoal · on Nov 22, 2016

Sort of disagree with the send-everything-in-the-payload approach. It opens your system up to all sorts of weird edge case bugs like receiving hooks out of order which could mean stale data is considered fresh. It also means you have to care a lot more about verifying the authenticity of the request.

beachy · on Nov 22, 2016

Agree. Its better to use webhooks as a pure signal that something has changed, and then in the case of update or insert, have the client pull whatever they want using normal API.

Otherwise, you end up in a descending vortex of madness trying to specify some protocol whereby the client can specify in advance which properties they care about.

draaglom · on Nov 22, 2016

This is correct, and important.

Webhook payloads need to be logically monotonic[0]; this probably means either:

- having a lamport-clock timestamp for each payload so you can entirely discard older payloads in favour of new ones

- a well defined / consistent "merge" function over the subset of the payload you care about (e.g. maybe you know a customer's state can never go back from "registered" to "guest")

[0]: http://bloom-lang.net/calm/

nimblegorilla · on Nov 22, 2016

"trust but verify"

Sometimes you want to ignore webhooks based on the payload (or put them into a different queue). It's faster to do that if you get the payload up front.

madamelic · on Nov 22, 2016

Also, in your documentation, please show what the webhook events will look like since developers actually want to write code and not guess at what we will get.

cough Stripe. (https://stripe.com/docs/api#events)

brandur · on Nov 22, 2016

The implication was meant to be that the information under `data/object` is simply a full representation of another API resource of the type on which the event occurred, and that you can look elsewhere in the documentation to see exactly what each type will look like (you can see a subscription embedded in the sample response for example).

Fair enough that we could rewrite this to be more explicit about that though! We'll see what we can do to make that section more clear.

(I work for Stripe.)

madamelic · on Nov 22, 2016

This is what I love. I purposefully talk about Stripe on here just because I know Stripe people browse HN.

Stripe is great though. :)

enraged_camel · on Nov 22, 2016

You don't have to guess anything. Stripe lets you send test webhooks to the endpoint you specify[1]. You can set up something like ngrok[2] on your localhost to examine the headers and bodies, then write your code to parse them accordingly.

[1]http://i.imgur.com/oCpxYwE.png

[2]https://ngrok.com/

madamelic · on Nov 22, 2016

Yeah, that is one method.

I also learned about services that will set up test webhooks without having to go about setting up a server, etc.

I think I might use that, but I still think that docs should at least explain what will get sent. Maybe that is a bit too verbose in Stripe's case though.

misterbowfinger · on Nov 22, 2016

This is going to sound bizarre, but why do webhooks and not just an AMQP queue? I get that receiving HTTP POSTs is easier, but it just seems better to setup a publisher/subscriber relationship. That way, if a subscriber goes down, they can always catch up. And publishers can allow messages to sit in the queue with a TTL and max_size. It seems like a win-win for everyone.

jon-wood · on Nov 22, 2016

It's not AMQP (sadly) but something I've done previously is to have the actual webhook endpoint be as dumb as possible, doing nothing but accepting the payload (maybe with some very high level validation that the request was expected) and pushing it into a real queueing system.

This means you can handle all sorts of failure modes, not just the backend going down, but also bugs in the consumer that would otherwise result in losing the request. I've not tried it, but I imagine this is a pretty good usecase for AWS Lambda as it's a small bit of glue code.

stephenr · on Nov 23, 2016

I've used the same basic concept for accepting payment-received notifications (from a http redirect based payment gateway):

Read the transaction ID from the request body, and store it (with a date/time) in a table. A periodic process later checks them, and uses the payment service's API to validate the payment is valid and take appropriate action.

fiatjaf · on Nov 22, 2016

Shameless plug: https://requesthub.xyz is ideal for these cases.

jon-wood · on Nov 23, 2016

That's pretty awesome, thanks for the tip.

jbert · on Nov 22, 2016

One difference is in "dead connection detection". How do you know that your AMQP connection is down? At some level you're polling, whether that be TCP keepalive, application keepalive or something else.

If you're doing polling, you're actually back at the same pre-webhook place - polling their server on some timescale which is a compromise between latency and load.

Yes, a TCP keepalive is generally cheaper than an HTTP long poll request, but only by a constant factor.

limelight · on Nov 22, 2016

It's a whole lot simpler to do secure cross-organization HTTP requests than it is to figure out how to have multiple AMQP subscribers from untrusted companies.

elmigranto · on Nov 22, 2016

Though this is an approach worth considering for a bunch of local services, at least a subset of them, I don't see it working with 3rd party APIs. Consider something like Stripe — it is probably far more straightforward to invoke some HTTP endpoints, than to set up a huge infrastructure with millions persistent client connections.

giuliano84 · on Nov 22, 2016

Agreed. I believe that AMQP connections are best suited for wide and articulated local environments. Moreover REST HTTP endpoints are today's esperanto :)

niftich · on Nov 22, 2016

Because anyone can throw another route onto port 443 if you already host a website. A non-HTTP protocol running on a dedicated port, despite often being a superior solution, requires extra effort to set up, if the hosting environment even provides that option.

boubiyeah · on Nov 22, 2016

Webhook only makes sense if you don't care a single bit about missing updates. If not, it's deeply flawed.

A pull model (polling, long-polling, SSE, etc) is strictly superior for synchronisation. You just can't "miss" updates, can restart from the beginning again and reinterpret past events in a different light, the client goes at its own pace, etc.

paulddraper · on Nov 23, 2016

To expand on icebraining's comment, you can use weebhooks as notifications to poll.

icebraining · on Nov 22, 2016

Luckily they aren't incompatible, you can take advantage of both.

nimblegorilla · on Nov 22, 2016

It's also really handy when API providers give a nice webhook UI that lets you view and resend webhooks during development.

onion2k · on Nov 22, 2016

BitBucket is a very good example of web hook integrations done right. Relatable, logged, and well documented. I learned from their UI when I implemented my own version.

giuliano84 · on Nov 22, 2016

True. Also having a sample request at the very beginning is useful so you don't have to find a way to trigger the event by clicking around on the product UI.

johns · on Nov 22, 2016

I did a talk a little while back on providing a good developer experience around webhooks that covers a lot of the same topics. I wish I had a recording of it, but the slides are here: https://speakerdeck.com/johnsheehan/crafting-a-great-webhook...

Edit: found the video https://www.youtube.com/watch?v=xc5ezyJjz1k&feature=youtu.be...

simo9000 · on Nov 22, 2016

I'd like to follow up on the statement that the OpenAPI tools do not support webhooks. This is slated to change in an upcoming version of the OpenAPI-specification. Check out https://github.com/OAI/OpenAPI-Specification/pull/763 to see details. As soon as this is released, it only be a matter of time before Swagger and the rest support webhooks.

z3t4 · on Nov 23, 2016

To be fully client based (serverless) you need a middle-man for web-hooks. Websockets are a better alternative for stand-alone web clients. There are also "push notifications" via web workers but they are vendor dependent.

arxpoetica · on Nov 22, 2016

Honest question. Why webhooks over something like push/pull handshaked socket?

detaro · on Nov 22, 2016

For the receiver: Everything that can run a dynamic website can run a webhook receiver, opening an arbitrary socket connection isn't possible in all environments (e.g. software running on shared hosting or PaaS). You'd also need to define and implement a protocol on top of said socket, whereas more or less every web developer knows what to do with HTTP POST with a JSON payload.

And for the sender, keeping many concurrent connections open can be quite a challenge. Sending Webhooks also takes resources, but at least you can easily distribute it over many machines/processes if necessary.

ec109685 · on Nov 22, 2016

Slack offers both. Their realtime API is especially nice because you connect to it rather than needing to deploy public facing web services to receive events from them.

On the other hand, if you are not connected, messages could be lost unless you build in syncing capability, whereas with web hooks, Slack will handle the retries for you.

intellent · on Nov 23, 2016

What I am most interested in is how to test/debug webhooks during development.

How do I tell webhook providers to send test notifications to my local development instance without tampering with the production setup on both sides?

jeffnappi · on Nov 23, 2016

This is another thing that Stripe nails. Out of the box it comes with a Test mode and you can easily use this to test your webhook implementation.

Another way to handle this is to create and maintain a mocking tool that will generate requests.

Animats · on Nov 22, 2016

So "push technology" is called "webhooks" now?

How does this all integrate with HTTP 2? Can you get your notifications over a channel you already have open for other reasons?

_qbjt · on Nov 22, 2016

Not quite. "Push technology" is sort of an all-encompassing term for server-to-client updates whereas webhooks pertain specifically to HTTP callbacks. An example of a webhook would be GitHub making a POST request to some URL (set by the user) whenever new commits are made to a repo. Push technology might take the form of webhooks, long polling, WebSockets etc.

Webhooks are traditional HTTP requests so I don't believe HTTP/2 changes anything. The ability to differentiate notifications depends on the service / API you're integrating with.

mosselman · on Nov 22, 2016

It would have been interesting to read about tests for web-hooks. How do you do integration testing for example?

stevekemp · on Nov 23, 2016

In the past when I used to support webhooks what I did was very simple:

* Receive the HTTP POST submission to my hook end-point.

* Save this data in a queue.

* Return to the hook-caller "200 OK - $ID".

This was better than trying to initiate a long-running job as a result of the hook, and meant that I could trigger "fake webhooks" just by adding data to the queue manually.

I'm sure there are other approaches, but this is a flexible one that also gave the benefit of being simple. (For the queue I just used Redis.)

g105b · on Nov 22, 2016

Apostroph'ed.