Building Protocols with HTTP

heipei · on July 19, 2018

I can't say this trend surprises me. Years back I was an IETF meeting when people were already mentioning that the old saying "IP over everything, everything over IP" was slowly being phase out for "HTTP over everything, everything over HTTP".

The thing to realize here is how much middleboxes sit between a client and a server, boxes which are aware of higher layers than just IP. It's impossible today to deploy an Internet-wide layer-4 protocol other than TCP or UDP, and increasingly it will be harder and harder to deploy a truly universal L7 protocol but HTTP. That's my limited view of this issue, how things like IPv6 might affect it I have no idea.

Immortalin · on July 19, 2018

Building a protocol over HTTP has the added benefit of making security easy. HTTP+TLS has a lot more eyeballs than protocols on lower levels. Most protocols on layer 3 and below weren't designed with security as a priority. If you are building a domain specific protocol, it's a lot easier to build on top of one which already has a good security story backed by a triumvirate of internet unicorns, than trying to re-invent the whole public key dance on a homebrewed setup.

geocar · on July 20, 2018

> Building a protocol over HTTP has the added benefit of making security easy

Not to take away from that, it's also very easy to get wrong as well.

- Not doing CA verification check[1]

- No alerts/monitoring of new certificates used by the IDP

- No verification on certificate-transparency of new certificates used by the IDP

- No online CRL checking

If you do not do these things, then your application-on-HTTP is insecure. Remember that it's very easy to get a new certificate if you can break IP either by MITMing the server or by BGP takeover of some space near the CA. Both of these things have happened.

If there's a way you can detect incorrect implementations, then blacklist them. For example, I use OAuth2 in my own systems, but I require clients retry (by randomly failing during onboarding) and implement fuzzing whereby the client_secret is invalidated if they accept an invalid certificate.

[1]: http://web.archive.org/web/20120317165131/http://forum.devel...

tialaramex · on July 20, 2018

Those eyeballs almost all belong to the Web Browser vendors who are almost exactly a proxy for the Operating System vendors, you mention them as three "unicorns", though it's hard to pick three, I have an easier time saying four, or maybe five.

Specifically they are Google's Chrome (Android), Microsoft's IE/Edge (Windows), Apple's Safari (iOS/ macOS) leaving only Mozilla's Firefox (standing in for the Free Unixes). If there's a fifth it's Oracle, as custodians of Java.

A tremendous weight ends up resting on the shoulders of these vendors, all of them based on the US West Coast, and it's not as though this a power _seized_ by them, it's more something everybody else shirked.

paulddraper · on July 19, 2018

SSL/TLS with other protocols have a lot of usage too.

SFTP/FTPS, SSH, SMTPS, etc.

BTeam · on July 19, 2018

SSH is a cryptographic network protocol that doesn't use SSL/TLS.

paulddraper · on July 19, 2018

My bad. I should have said SSL/TLS/SSH.

* SFTP/FTPS (SSH)

* SSH (SSH)

* SMTPS (SSL, TLS)

tzahola · on July 19, 2018

Still wrong. FTPS is vanilla FTP over TLS.

bryant · on July 20, 2018

> Still wrong

You know that thing that gives security practitioners a bad reputation with engineers?

You're doing it.

As opposed to this, why not something more like "it's not uncommon for these two to be confused, but FTPS is FTP over TLS and isn't interchangeable with SFTP"?

meeeh · on July 20, 2018

Because it's longer? Pardon my foreign mind, but I do not understand, why you are offended by a simple short and valid statement nor why you are differentiating between "security pratitioners" and "engineers".

tzahola · on July 19, 2018

FTPS is FTP over TLS. However SSH and SFTP has nothing to do with TLS.

eftychis · on July 20, 2018

As others have commented (adding a couple of notes): HTTP and HTTP/2 use TLS for security which is over TCP (reliable transfer requirement). Nothing more, nothing less. Sure layer 3 and below have different considerations (reliability). Layer 4 (say TCP) covers the end to end security. It would be nice if HTTP/2 added something security wise, as was discussed, but the committee was conservative. Thus I am not sold to this argument. The rich functionality of HTTP and multiplexing of HTTP/2 sound appealing (sections 3.1-3.4), if you are building an API. If not I would not rush to that conclusion. Why would I want to add the necessity for streaming if I am doing a over the network video compression and do not care about reliability or want to skip frames for instance?

I guess it is IETFs job to standardize stuff so that is what they do -- for a hammer everything is a nail, right?

P.S. I skimmed it, so perhaps it is worth a read, but nothing caught my eye really.

tzahola · on July 19, 2018

You only need TLS for that. HTTP adds nothing security-wise.

paulddraper · on July 20, 2018

It adds Authorization, WWW-Authenticate, Proxy-Authentication, Proxy-Authenticate, Cache-Control, but those aren't even used that consistently.

gsich · on July 19, 2018

It is possible. You just don't care about people that have shitty admins. It works for games (mostly UDP) and nobody complains, perhaps because people shouldn't be playing in the office.

bm1362 · on July 20, 2018

Just curious, what do you think of QUIC? Seems like a layer 4 replacement, right?

heipei · on July 20, 2018

The fact that Google chose to build QUIC on top of UDP rather than on top of IP directly shows that not even Google think it's possible to introduce any new Layer 4 protocol.

erkl · on July 20, 2018

QUIC operates on top of UDP, which would make it layer 5.

paulddraper · on July 20, 2018

QUIC is the transport layer (layer 4).

> QUIC is an experimental transport layer network protocol [1]

You can run layer 4 on top of layer 4. (Just as you could say, run TCP layer 4 on top of HTTP layer 7.)

[1] https://en.wikipedia.org/wiki/QUIC

tptacek · on July 19, 2018

There is some pretty dubious and I think wishful stuff in here, like "applications should align their usage closely as possible with web browsers", or the idea that ad-hoc protocols specified over HTTP should use links rather than fixed URLs. Does anyone have the backstory on this I-D? Anybody can write one, right?

totalthrowaway · on July 19, 2018

He's not quite anyone.

https://en.wikipedia.org/wiki/Mark_Nottingham

Perhaps this is meant as a public service, just to gather best practices. If you want to build your own thing, please go ahead, but don't call it HTTP.

tptacek · on July 19, 2018

Fair enough, re Nottingham. Thanks for the link, that neatly answers my question.

I strongly disagree that ignoring these suggested BCPs disqualifies an application as "HTTP". As far as I'm concerned, if a typical middlebox will reliably pass messages in it, it's HTTP.

m0llusk · on July 19, 2018

HTTP was intended to support web browsers, so there may be value in keeping compatibility. Application development, debugging, and scripting can all potentially be well served by having standard HTTP interfaces for interactions because this enables application operations to be performed or spoofed using web browsers.

Ajedi32 · on July 19, 2018

> Applications that use HTTP are encouraged to allow an arbitrary URL to be used as that entry point. For example, rather than specifying "the initial document is at "/foo/v1", they should allow a deployment to use any URL as the entry point for the application.

Is this really saying that, for example, I shouldn't have an API specification that says "to fetch a list of users", send a GET to `/api/v3/users`? What's the alternative?

paulddraper · on July 19, 2018

Hypermedia [1] As The Engine Of Application State (HATEOAS) https://en.wikipedia.org/wiki/HATEOAS

I not particularly a fan, but it's fairly common. It was popularized as one of the pillars of the REST paradigm.

[1] https://en.wikipedia.org/wiki/Hypermedia

[2] https://en.wikipedia.org/wiki/HATEOAS

nickik · on July 19, 2018

Is it actually common? Its talked about a lot, however in practice it is not used much from what I can tell.

paulddraper · on July 19, 2018

For starters, HATEOAS it's used as the foundation of the most common networked applications in the world: web browsers.

Paypal uses it for their API.

But yeah, it's kind of like functional programming: the ideas are super common and well understood. But by the numbers, the less pure imperative programming paradigm is overwhelmingly more commonplace.

geocar · on July 20, 2018

One standout: Service discovery. OpenID Connect[1] for example publishes an endpoint discovery document. This is much easier than building a versioning-layer into your application router.

[1]: http://openid.net/specs/openid-connect-discovery-1_0.html

This is also good to combine with requiring users retry and use multiple endpoints. This way you don't have to take risks with DNS and global routing failures potentially breaking your application.

If you use TLS to protect your application, I recommend further (randomly) generating a bad SSL certificate to make sure clients retry (and get a good one) so that you can detect anyone stupid enough to take advice like this[2]

[2]: http://web.archive.org/web/20120317165131/http://forum.devel...

antpls · on July 20, 2018

> for example publishes an endpoint discovery document. This is much easier than building a versioning-layer into your application router.

How is that different than having only 1 version of the API and updating it? Clients code will still break if a structural data change is pushed (for example, removing a field from the responses)

geocar · on July 20, 2018

(Perhaps a question for the mods: Why are so many of antpls's comments flagged/dead? Their other behaviour doesn't seem malicious, and this seems like a reasonable question to me even if it doesn't directly address my comment.)

> How is that different than having only 1 version of the API and updating it?

Using a service discovery document makes it easier to put /v1 and /v2 on separate servers/domains instead of putting both on all the servers/domains. Serving the service discovery document is cheaper than having your front-end server route all requests to backend systems (like using nginx or an Amazon ELB in application mode).

Putting applications on different domains is desirable if /v2 has performance gains (or cost gains) obtained by using the new protocol. Or if they're separate codebases.

> Clients code will still break if a structural data change is pushed (for example, removing a field from the responses)

It's very difficult to predict how a client will react if you haven't seen/vetted their implementation, but there are things we can do that go beyond service discovery.

For example:

My advertising service allows publishers (people who make web pages and content) to choose how to render my advertisers' ads in those pages. They do this via a server-side call -- I don't do any tracking of this, but I want to be sure that the client is doing the right thing because I'm still representing the brands.

I can look at the web page and sign off on it, but I want to make sure that it'll support a wide range of ads -- including ones I haven't thought of.

So I put a lot of energy into patterns like fuzzing the clients, and doing things that I hope will require they do things securely:

- I require retries so users always get an ad

- I randomly giving them a bad SSL certificate -- one that if they accept, I block their account. I usually only do this during onboarding or for my crawlers (recognised by IP) which check for sneaky stuff.

- I require certain utf8 characters to make sure the query is utf8 encoded

- I randomly insert random fields to make sure I can add new ones later

- I randomly reorder fields in the response to make sure they're not counting

I think that if you do all these things, and perhaps other things, and clients are used to it then you probably can have just one version of the API and update it. But this is hard work, and the enterprise is not (in my experience) very fond of hard work.

JoshTriplett · on July 19, 2018

API servers may need to co-exist, by mounting them into a hierarchy. You can certainly have `/api/v3/users`, but you should have enough configurability to support mounting that under `/yourapp/api/v3/users` so that someone doesn't need to rewrite URLs in a proxy.

rixed · on July 20, 2018

As I regard HTTP to be one of the main hindrance in todays IT industry I feel it is my duty to add my voice to those trying to warn against the now decade long trend to wrap everything into HTTP.

Things that HTTP makes more complicated:

1. It's a verbose protocol with more often than not a really bad ratio payload over metadata;

2. Client authentication is hard;

3. Server authentication is expensive;

4. It's bad at streaming;

5. It does not help a bit for RPC (no support for fan-out, quotas, fail-overs, retries, ...).

Building an infrastructure on HTTP makes it so unreliable that I came to consider one of the reason for Google technical superiority is that they early ditched HTTP completely out and used stubby for everything instead. Maybe they themselves view it as their secret sauce, which would explain why they've never published a comparative review (that I'm aware of)? Many people I've spoken to about this have expressed surprise that there is no HTTP flowing on Google's veins. To many, removing HTTP is like removing the solid ground; but to a former googler, loosing access to stubby and having to work with HTTP again is a painful experience that requires quite some time to accept.

icebraining · on July 20, 2018

So Google made gRPC (which uses HTTP) just to sabotage their competition? Man, that's devious.

By the way, server (and client, if needed) auth is implemented by the TLS layer. Do you propose using something else to secure other protocols?

rixed · on July 20, 2018

They definitively pushed HTTP2 to help their clients move away from HTTP. My guess is they cannot make stubby public without disclosing too much valuable advantage and probably also knowing that this can only effectively run on their own DC at that point. At least, gRPC is easy to unwrap from HTTP (which is likely the very first step of processing on their side). Regarding protobuf, I'm not a big fan but at least that's not JSON (another awful tech that the industry is taking for granted).

Also: TLS is slow, and last time I asked for the best practice to do client auth on HTTPS the general consensus seamed to be "don't do it".

bullen · on July 20, 2018

I think you are wrong on all accounts, but specifically take a look at Transfer-Encoding: chunked and learn some crypto.

Also you don't need to put all those headers in the requests/responses. And soon you will be able to override the horrible User-Agent one even in Chrome.

ori_b · on July 20, 2018

> they early ditched HTTP completely out and used stubby for everything instead.

Yes, when I worked there, it was interesting to realize that they take HTTP, and immediately convert it to protobuf.

einrealist · on July 20, 2018

I don't get it. Why is HTTP such a bad thing? It proved itself worthy many times. Does it cover all of your requirements? Maybe not. Does it prevent you from using something different? No.

It is just a protocol for stateless server-client communication. Nothing more. What you build on top of it, is your choice. And you don't have to use it. But it is a mature protocol and it is well understood and it has a huge ecosystem of technology around it, that is just good enough for many use cases.

> 1. It's a verbose protocol with more often than not a really bad ratio payload over metadata;

Its a stateless protocol. You have to provide all the required meta data (Headers) in a conversation. Otherwise, it would not be able to be stateless. And yes, it does use text, because it would be extremely difficult to create a evolving protocol just by using some more memory-efficient binary tricks. Text is just better for comprehensible semantics. And that's even a good thing in pure machine-to-machine interaction.

There is also HTTP/2 which addresses efficiency and latency issues. But it trades that for simplicity of the transport method.

> 2. Client authentication is hard;

It depends. It can be as easy as sending an Authorization header. Or you can defer authentication to the transport layer (client certificates over TLS). Otherwise, I'd say: Best security is hard. And it won't get easier in other stateless protocols.

> 3. Server authentication is expensive;

That's just too generic. What does "expensive" mean? For what use case is it too expensive?

> 4. It's bad at streaming;

It depends what you mean with streaming. I think, you mean legacy binary streams? Yes, HTTP is not meant for that. But you can use HTTP for initiating your custom streaming use case. There is the "Upgrade" header. Or you just provide an URL to your streaming endpoint that uses something different than HTTP or even TCP, as long as the client can handle this stuff (e.g. RTMP).

Then there is something that is often misconceived as a utility for streaming: using chunked streams or using the Range header. Chunked streaming is just for fetching content of arbitrary size. And having a server support "Range" allows for fail-over / retries. Its not meant to support continuous streams of data.

> 5. It does not help a bit for RPC (no support for fan-out, quotas, fail-overs, retries, ...).

You can do RPC if you design your payload around it. See SOAP or even something like RMI over HTTP. Fan-out, quotas, fail-overs and retries. Everything can be done using HTTP. Does the specification itself solve those issues? No and it should not, because it would make it too complex and harm adaptation of HTTP.

ForHackernews · on July 19, 2018

I really wish people would stop trying to cram everything into an HTTP-shaped box.

tptacek · on July 19, 2018

You can wish that all you want, but it isn't going to happen. HTTP is an extremely effective way to deploy new protocols; it is effectively the richest protocol design toolkit available universally to developers, and has the added benefit of playing well with middleboxes. Expect more "cramming" in the future, not less.

sp4c3m0nk3y · on July 19, 2018

I agree. What is wrong with creating new protocols?

figgis · on July 19, 2018

Familiarity. I can either work with a protocol that has been around, in some way shape or form, for almost 30 years. That has matured, has a myriad of information relating to edge cases, is familiar to new and old developers, and is quick to develop for.

Or I can create my own protocol which has none of those benefits besides possibly quick development.

ori_b · on July 20, 2018

> Familiarity. I can either work with a protocol that has been around, in some way shape or form, for almost 30 years.

Yeah, using TCP directly is great, isn't it?

zaarn · on July 20, 2018

Middleboxes tend to not play well with non-HTTP content these days. Not long ago this was also the case for non-UDP/TCP protocols, this is even more cemented.

The only times I know that non-HTTP works is video games and that is sorta hit-or-miss on one ISP I frequent, they block protocols they don't know for "security".

Making it HTTPS is an easy way to get around any potential firewall problems and have it work easily in corporate settings.

abricot · on July 19, 2018

Anything outside of port 80/443 will be blocked at some point.

nineteen999 · on July 19, 2018

Blocked by who? Are you talking within some specific context, eg. corporate networks? Because HTTP/HTTPS are both TCP based protocols with some very limiting characteristics for many applications.

There are a whole lot of applications out there in widespread use that depend on UDP at the very least, eg. most multiplayer network games, just to use the simplest and most common example I can think of. Not everything can (or should) be stuffed into a HTTP transaction.

gsich · on July 20, 2018

Yes, the blocking problem mostly affects corporate networks or shady wlan access points.

Game developers don't really have a problem with using UDP/TCP on random ports.

nineteen999 · on July 20, 2018

Sure, but there's a bit of a stretch from there to the parent posters claim "anything outside of port 80/443 will be blocked at some point" as a justification not to develop new, non-HTTP based protocols.

gsich · on July 20, 2018

Of course, it's just a sloppy excuse.

ori_b · on July 20, 2018

Only if nobody uses those ports.

I want as much as possible to break for people who block those ports, and engage in other user-hostile actions.

gsich · on July 19, 2018

Stupid people block stuff. That's why (somewhat less stupid) people try to tunnel everything via HTTP.

PinkMilkshake · on July 20, 2018

It's the next logical step to me. We have the Link, Internet, and Transport layers pretty well sorted. The Application layer is the next in line and it seems likely the web is going to win. A good thing about the web winning(and possibly part of the reason it's winning)is it has a lot of the next(and possibly final) layer already standardized. I'm not sure what you'd call it, let's say the Semantic Layer, all our human shit. (HTML, XML, and all the data formats, etc).

bullen · on July 20, 2018

Here is an example of how you can do this: https://github.com/tinspin/fuse

To use HTTPS, HTTP/2.0 or WebSockets is really not clever options when you can do secure, simple and more performant stuff in clear text over HTTP/1.1

ori_b · on July 20, 2018

"No".