Yeah the web tier is rarely the bottleneck, can be scaled linearly when it is and typically very susceptible to caching.
I didn't like the presentation of the benchmark either. There isn't enough information to know even what the configuration is (uWSGI or gunicorn? it doesn't say) let alone to replicate the benchmark on your own hardware. It's disappointing that the author doesn't include proper stats like standard dev, nth percentile, etc which would give better impression of what's going on. On top of all of that, it's a microbenchmark that doesn't include the typical db access, caching and so on of a real web application, so it gives a misleading impression of the speedup you could get over another framework.
People say this all the time- While I think it is mostly true, I think it is more often untrue then most would assume. In well optimized systems the "IO layer" is optimized first as it rightly should be, at this point improving the single-threaded performance and concurrency/parallelism start to really matter.
>can be scaled linearly
Sure but scaling my number of servers linearly however doesn't imply linear maintenance/operations effort... If I can run a order of magnitude less severs, I am going to save money. Its all about figuring out if the savings are worth shift to a higher performance platform... but playing devils advocate if one had started with the higher performance platform to begin with...
>when it is and typically very susceptible to caching.
That is making a lot of assumptions about the problem space at hand. But its fair point, when things are catchable, its magical.
Twitter got a 10x reduction in number of front-end web servers and a 10x latency improvement by changing the platform. The layer literally called thrift services and made HTML pages. The 10x reduction in servers was great at the scale that they were at but the 10x decrease in latency was the real winner and it had nothing to do with scale. Pure single request performance.
I think you're asking whether they optimized the rewrite, rather than just ported it from Ruby to Scala? I too would be interested to know if there is any data to this.
It was an exact port including bugs — output needed to be exactly the same as that was the best way to test the result. I wouldn't say the Scala code was any more optimized than the Ruby code was over the years. In fact, lots more effort had been put against optimizing the Ruby side (for example, the template engine was written in C for performance).
That's not exactly true. For example, as you said, employing a cache may significantly improve performance - essentially to the point where 99% of the time is spent inside framework code (routing, params parsing, etc). In such a situation a "web tier" may become a bottleneck, and when it does it's quite deadly: you can't do anything about it, short of abandoning or changing the framework.
> that doesn't include the typical db access, caching and so on
But that is not what a web framework itself does, right? It doesn't offer ORM layer nor cache layer and it's not being compared to Django, but rather to Bottle (which performed remarkably well) and similar.
Anyway, as to the framework itself: from reading the page I got an impression that it's thread-safe, which would be more than a reason enough to use it were it true. In the past I used Erlang with Cowboy for building performant web apps - it was enjoyable and I liked it, but it also required the skill to do so, which is not that widespread unfortunately. Falcon probably is nowhere near Erlang/Cowboy performance, but it could be just about "fast enough" for most use cases.
I didn't like the presentation of the benchmark either. There isn't enough information to know even what the configuration is (uWSGI or gunicorn? it doesn't say) let alone to replicate the benchmark on your own hardware. It's disappointing that the author doesn't include proper stats like standard dev, nth percentile, etc which would give better impression of what's going on. On top of all of that, it's a microbenchmark that doesn't include the typical db access, caching and so on of a real web application, so it gives a misleading impression of the speedup you could get over another framework.