Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
BigPipe: Pipelining web pages for high performance (facebook.com)
90 points by aristus on June 4, 2010 | hide | past | favorite | 32 comments


Sounds like, if they don't make their solution open, this will become the basis for a fairly nice little framework.

It'll be interesting because this will require significant rework to fit with how most web servers work. It would be hard to implement in NGinX, for instance. Facebook probably just wrote a custom server, or heavily modified the code to an existing one.


I think you could actually implement this with nginx + <insert evented/actor server (tornado, rainbows, mochiweb, yaws etc.) here> + your application quite easily. Nginx buffers the client's request (you cannot turn this off) before passing it to the backend but you can turn proxy_buffering off which stops nginx from buffering the response to the client. In this middle tier you could instantly send the headers + loading JS and flush the buffer then determine the pagelets to render and either do it in process or further delegate to application servers with HTTP or your own protocol.

This does kind of beg the question, why use nginx at all? It provides you with a lot of protection against malformed requests and general fuckery. If you need it you can push the middle layer back into nginx as a module which would be screaming fast.

This reminds me of Heroku who do something like this with their 'routing mesh.' client -> nginx -> routing mesh (erlang) -> thin (ruby app server). The erlang process knows which ec2 instance has the ruby process to serve the request and initiates and is basically a smart proxy. You could quite easily query 6 backends simultaneously for each pagelet and pipe the JSON out to the client then throw in the footer as well.

I think this deserves some experimentation.

nginx + erlang + rails (serving JSON).


Actually, you can implement this with nginx + client-side JavaScript to merge "sub-pages" right now, you don't need any server-side language or framework.

Guys from Taobao (http://www.taobao.com) have open-sourced pretty much everything you need to do that:

- ngx_echo (http://github.com/agentzh/echo-nginx-module) for asynchronous pipelining,

- ngx_drizzle (http://github.com/chaoslawful/drizzle-nginx-module) for fetching data from Drizzle/MySQL/SQLite,

- ngx_postgres (http://labs.frickle.com/nginx_ngx_postgres/) for fetching data from PostgreSQL,

- ngx_rds_json (http://github.com/agentzh/rds-json-nginx-module) for converting database-responses into JSON.

Also, this isn't new concept, at least few Chinese companies I know of use similar rendering process.


With all respect to the people in taobao, but this is different. The whole page (without css, JavaScript and images) in facebook case generated through one http request. In your case, they use Ajax to get content through serveral http requests. IMHO the facebook method is better. It avoid http request overhead and parallized steps as many as possible.


You must have missed ngx_echo module, because it's the part that makes this exactly as described in Facebook's BigPipe blog post.

I've prepared simple proof-of-concept configuration for nginx:

http://labs.frickle.com/misc/nginx_bigpipe.conf

As you can notice, every "sub-page" is generated individually. Using presented configuration everything is chunked and flushed, so it will be sent to the client right away. Response on the client side looks like this:

http://labs.frickle.com/misc/nginx_bigpipe.output

DISCLAIMER: I don't know how Taobao is using released modules internally or if they use them in production already (but I know some portals do).


It's actually webserver-neutral, implemented entirely in PHP and Javascript. The basic idea is that each "pagelet" does its own data fetching and rendering and returns a thunk of HTML, js & css dependencies, onload hooks, etc, as a data structure (JSON or what have you). As each pagelet completes it is flushed down the single network connection to the client using HTTP chunked encoding.

The Javascript half of BigPipe catches those flushes, handles the dependencies, registers handlers, and slaps the HTML into place. This lays the groundwork for many things like incremental page updates, parallel execution, and so on.


Second paragraph: Although BigPipe is a fundamental redesign of the existing web serving process, it does not require changing existing web browsers or servers; it is implemented entirely in PHP and JavaScript.

So it may be a good candidate for a library.

I guessed they were doing something similar a few months ago when Facebook first started showing up block by block. Interesting to see it all laid out.


This reminds me a bit of Edge Side Includes (ESI), although I realize this is done on the client side more than the server side.

ESI got alot of attention about 10 years ago, but then sort of fell out of favor in the tech media/blogs. Some big companies are still using them like Akamai, and the Varnish HTTP accelerator has some basic support for it.

I always liked the idea of breaking up my page into smaller segments, and then caching each part independently, and assembling the page from the cache. The cache could request only the parts of the page that aren't in cache/expired/uncachable, but otherwise pull everything from a super-fast cache.


Facebook's approach seems like a really cool way around browser limitations.

ESI is really cool for caching inside your infrastructure, but it doesn't help the client as much because they have to download the entire page again even if only the time changed in the top bar. I've always dreamed of a way of cutting up my HTML page to have different pieces 'cached' by the browser. A logical next step from this style (which helps you load a client with a cold cache) is to have the client cache each of these pagelets. On subsequent requests you could return JS (and actually you could check a cookie and recycle the JS too!) look in the client's HTML5 storage or some other crafty mechanism which will reduce strain on FB's infrastructure and make it faster for the user.


I read about somebody using HTML5 client storage to cache pagelets... ah, here it is: http://www.usenix.org/events/webapps10/tech/techAbstracts.ht...


Sweet, instapaper'd!


Either the writer of the article sucks at coming up with analogies or doesn't really understand what pipelining sequential logic means :)


For slow loading using client side includes with JS to load non-core pieces of the page is a pretty common perceptual speed up technique- if the main content loads quickly, we can wait for the shared elements to render.

Client side includes didn't make it into HTML5. Maybe in HTML6? Check this space in 2020. http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-Aug...


Wondering how this affects SEO and non javascript enabled browsers. I assume one would still have to implement the more traditional solution as a backup option.


Try Facebook in Firefox using the Web Developer Toolbar extension to disable Javascript. You'll see that major features just silently fail to operate correctly. I think that NoScript users and Javascript-free browsers are basically non-existant. That doesn't mean you shouldn't design for the before-Javascript-is-downloaded code path, but I wouldn't really worry too much.


I was more concerned with how this would be applicable to a non Facebook site, particularly one that relies significantly on SEO for traffic. I would think Facebook would have solved this for the pages they do want to SEO (Pages, Questions, etc)


I wouldn't say they're non-existent because every so often you'll find some user on here bragging about how he uses NoScript. Personally though, I don't think that users who intentionally cripple their browsers matter.


But that's contious choice they're making. It's like worrying about making your website accessible for Richard Stallman.


The official NoScript site [1] downloads via the Mozilla Add-ons page [2], so its download count is probably accurate: 67,616,402

NoScript does not publish their individual add-on usage statistics, but the global download/usage ratio can be calculated from the statistics on the Firefox Add-on home page [3]:

1,962,617,946 add-ons downloaded 157,090,095 add-ons in use

About 8% of downloaded add-ons are still in use. Assuming NoScript's usage ratio is comparable to the average, approximately 5.4 million installations of FireFox are running NoScript. Let's ignore the fact that NoScript's usage ratio is probably much lower than the average, due to the fact that it breaks most web pages.

I'd wager that the average NoScript user has at least two machines, so the total number of NoScript users is probably less than 2.7 million.

There are over 230 million internet users in the USA [4]

Even if 100% of NoScript users were Americans, they form 1% or less of the general population. If you, like me, believe that I have been generous to NoScript here, it is likely that no script users are no more numerous than 1 in 1,000.

Even if the user is using NoScript, they can whitelist your site. You can probably add <noscript>WARNING: THIS SITE IS BUSTED WITHOUT JS</noscript> to the top of your page and call it a day. If you are feeling generous, redirect no-script users to the mobile version of your site and tell them why.

tldr: Assume that human user agents have Javascript.

[1] http://noscript.net/

[2] https://addons.mozilla.org/en-US/firefox/addon/722/

[3] https://addons.mozilla.org/en-US/firefox/

[4] http://www.google.com/publicdata?ds=wb-wdi&met=it_net_us...

EDIT: Converted to blog post -- http://news.ycombinator.com/item?id=1406233


People use NoScript to prevent crappy sites from doing annoying, stupid, or malicious things. I'm pretty sure most NoScript users whitelist sites that they visit that are legit and require JavaScript. That's how I use it anyway.


I think following progressive enchancement paradigm in your website for the sole purpose of including a marginal percentege of users is—just like supporting IE6—not really worth your time. If you can, however, do that, than it's a great strategy to make sure that all of your content will be indexable. I think google had some proposal for server parsing #anchor%20urls for visiting spiders, but I had never understood how was that supposed to reduce complexity. Meanwhile if your project is not a ginormous behemot, following Progressive Enchancement in addition to mentioned SEO benefits can help you to ensure for example that you validate all your input on server side etc. As long as it makes sense, just like TDD. But to prevent “before-Javascript-is-downloaded code path” scenario all you need to do is to include your <script/> tags in head section. One exception would be if user's browser time-outs or encounters error when requesting your script. But that's very, very unlikely. Just make sure you don't name your javascript files “free_porn_sex.js.”


I thought that Google's and Yahoo's crawlers did not parse JavaScript. And those are "browsers" I care a great deal about.


In Facebook's case, I think they are already serving completely different content to logged-in users and non-logged-in users/crawlers; most of the complexity of Facebook is invisible to crawlers anyway.


To me it sounds that BigPipe is as applicable as Flash or GWT - solutions already known for their problems with SEO. However, if that's not a problem for you then it's probably a good optimization for your site.

Keeping two different output formats for a site (one for crawlers and one for humans) sounds complex. To date, none of the sites I've developed could have justified such overhead in development.


This would be suicide for a site relying on SEO to drive traffic. It's a cool approach if you're building something for which that doesn't apply though (web-based enterprise software, privately-accessed tools, or if you're a multi-billion dollar company that's so large you can create your own world wide web and the search engines can go screw).


How is this different HTTP/1.1's support for pipelining?


This is done at the JavaScript level so you actually have more flexibility. Also, the article says that they load JavaScript dynamically so that it's executed asynchronously, as opposed to just including it in the HTML.


OTOH, pages have to be rebuilt on forward and back.


This seems very similar to MXHR

demo at: www.mixhammer.com

but using pages in chunks instead of only the assets on the page


sounds like iframes again.


Sort of. What they're describing has a lot of server-side bits going on to make it all work.

At least, as described. Rendering most of the page immediately and then leaving the connection open to shove more through is more server-push than the standard ways of doing this.


yep. I think you'd be better off with http pipelining and iframe or ajax based solution. At least that's transparent in terms of existing protocols.

Of course there's a lot more server-side and client-side work going on to support facebook's implementation here. They've essentially invented some crazy way of packaging multiple requests together but hiding that inside a broken html document.

If server push is the destination they I might have thought that websockets was a cleaner way of achieving it. I don't think server-push is the destination though in this case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: