Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even among startups, web scale data requirements are the exception, not the rule. Facebook and Google are ginormous. There are many, many very impressive applications whose database wouldn't tax a single commodity server. (Similarly, there are applications that make terrible businesses but which consume computing resources like losing a byte of information would doom humanity.)

I mean, go through a list of YC companies or other startups you respect, winnow it down to the ones that exited or otherwise achieved some level of success, and play guess-the-size. How many terabytes of storage do you think e.g. Airbnb needs?



There are a ton of high-traffic websites out there that don't need an architecture any more complex than a standalone DB server + PHP + varnish (or the equivalent).

More so, if devs spent as much time tuning the performance of their apps as they did fantasizing about "web scale" architectural pivots they would typically be farther ahead. StackOverflow.com is a perfect example of this. They run on tiny handful of windows machines, support gobs and gobs of traffic, and have absolutely fantastic performance. And as much of that is due to paying attention to performance and making sure to find and remove the bottlenecks where they exist as it is to using cutting-edge architectures like database sharding, map+reduce, eventual consistency models, etc.


One thing I've found is that scaling rarely means solving difficult problems. Rather, it means putting more time into finding optimal solutions to problems that are trivial at smaller scale. For example, should your startup use Apache, nginx, or HAProxy as a load balancer? If you're just launching, the answer is "Who cares, just ship the fucking thing!". If you reach the point where you start measuring page views in the billions (and yes there are start ups that are at this point), it matters a great deal. Or should you use Postgres, MySQL, or some shiny NoSQL thing? Again, probably doesn't matter for small websites. But for larger services, it matters.

Also, don't underestimate how large log files can grow in a data-driven business (like AirBNB seems to be). I could easily believe that they have many terabytes of data just from logging actions their customers have taken.


> Also, don't underestimate how large log files can grow in a data-driven business (like AirBNB seems to be). I could easily believe that they have many terabytes of data just from logging actions their customers have taken.

Logs don't have remotely the same access requirements as the databases used to serve a product.


Indeed, but it's worth pointing out that in this case "different" doesn't necessarily imply "easier". Instead of having to access the data across many concurrent connections, you have to be able to store the data efficiently so that it doesn't take up too much space and you can do jobs on them that don't take 3 weeks to complete. And let's not get into how you collect and merge them together. There are open source tools to do these things, but you're still looking at a decent amount of infrastructure to make it work.


Perhaps for the applications of yesterday (like Basecamp) this is the case, but the real innovation taking place is around collecting massive amounts of data and processing it in interesting ways. These systems are used every day to make quantified business decisions rather than best guessing based on someone's hunch. 37signals builds questionably good UIs on-top of a database, something people have been doing for decades now. The future is in augmenting intelligence by gathering massive amounts of data and reducing it for human consumption.


I don't disagree with anything you've stated. Explosive growth and requiring massive amounts of data storage are surely the exception not the rule.

That said, the blog post talks about enormous growth and it still fits inside Moore's Law's growth. I guess my gut is just saying it's not really that enormous in terms of startup scaling if it's still within those limits. Not to take anything away from 37Signal's success, but it feels like nothing of value was really added by this post. I present the post of a picture of 864GB of ram as supplementary evidence that is near the top of HN right now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: