Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Facebook PHP Source Code from August 2007 (gist.github.com)
319 points by scapbi on Oct 12, 2013 | hide | past | favorite | 141 comments


I remeber these days. I worked at this time for studiVZ - a german social network (still exist, but nobody is using it anymore) and this leaked was caused by a misconfiguration in their Apache setup.

In 2008 studiVZ was sued by Facebook. They said we theft their PHP, CSS/JS "code". (http://techcrunch.com/2008/07/18/facebook-sues-german-social...).

Indeed, the first version of studiVZ was inspired by Facebook. Our founder had seen FB in 2005 the first time in the US, but it wasn't possible to use the service without a .edu eMail account. That was the decision to make a local clone.

But the real story behind the lawsuit is a bit longer. In december 2006 Facebook tried to acquire studiVZ the very first time.

This picture with two of the studiVZ founder and some people from FB's management team is from this time:

http://img-a4.pe.imagevz.net/photo3/f3/9a/ce37499d84437cb744...

But FB hadn't the amount of cash to aquire the company. The Holtzbrinck Publishing Group aquired studiVZ later for 85 mio. €

In 2008 (before the lawsuit) FB tried the second time to acquire studiVZ - this time for 4,8% of shares! But this deal didn't happend - unfortunately.


Cool, i remember these days as well. Indeed it very much looked like StudiVZ just used FBs css filed and changed some colors.. It was a typical german clone success story. Dariani did a lot of questionable things but sold in the right time before the downfall of VZ.

I didnt really like it, they basically did nothing in terms of innovation that FB did, it had the same features for years. Later they had a half-assed app platform that never really took off. It was just a blatant rip-off like so many german startups, sadly.


We had to solve scaling issues most of the time. VZ was running on a multi-tier platform with services written in Java, PHP, Erlang, C++ and Python. 17 million users with around 25 billion requests per month (without static content)...

Some numbers from 2010:

- 60.000 req/s (without static content)

- 2.5 mio. memcache ops/s

- 300k queries to the database tier

We've built that platform with a team of around 20-30 engineers and we had to solve the scaling stuff first - before we could add new features to the platform.

Today it's easy to store 3 billion files (photos). You could easily use S3, but 2007 these services didn't exist or most of the cloud services just started with their services.


I think Facebook had the same problems. Maybe a bit bigger ;)


Excactly. But you can't compete if don't have enough (and the best) people and enough budget for recruiting.


Just curious, what social networks are popular in Germany now? And what led to the declining usage of studiVZ?


I think the most popular social network is Facebook. But as others said, people are looking for small services to solve a small problem.

studiVZ wasn't cool anymore. We missed the point for a rebrand to open the plattform for other people. "studi" is a abbreviation for student. Our answer was a new brand called meinVZ (just another UI, same database). Together with the ability to provide new cool features (e.g. Apps, Activity stream, i18n) it was just a question of time until FB was getting the market leadership.


Both questions can be answered with Facebook. For messaging though, Whatsapp is very common here these days. I don't like the service at all, but you're almost out of the loop if you avoid it - well, at least among my friends.


Does anyone else remember the friend graph visualization feature facebook used to have? It was an oddity, both because it seemed like a strange but useful feature, but also because it was a perl script.

One time I clicked it and got the source code to the file instead of the graph. It's somewhere on one of my hard drives, but it seems wrong to leak, especially since it has database credentials hard-coded into it.


managed to save a copy of several files that appeared at the time - home.php, album, group, friends, photo, profile, readmessage ... code was an interesting read, as were the comments in it: // You fucking link h4x0rs just got pwned // Shortcut out of this CRAZY expensive garbonzle // What did we come up with? // don't display the Tuna album (aid=-1) // is this group a dummy meant to populate newsfeeds? // do not list the creator as harvard // keep track of tuna contexts // Merman's Admin profile always links to the Merman's profile

// NOTE: ok, at this point we know we are going to display the full // page, so it is time to do a PHATTY PHATTY MULTIGET of all the shit that // we are going to need to make this page, or at least the most common things

// Clear fire if desired


Can anyone explain the strange obsession with tuna? Even in this index.php:

    // make sure big tunas haven't moved around


I am going to guess one of their test environment has an album with pictures of Tunas.


Based on my secondhand knowledge, a "tuna" was a way to attach a wall (comments) to anything with a unique ID.


this was obviously computer misuse. although they didn't intend to show you, you saw the secret credentials. go straight to prison.


I wonder if it was based on the code Zuckerberg used for the Java friend graph he had on his homepage prior to Facebook.

Could you perhaps censor the credentials and release it?


I am not a lawyer, but even with censored credentials this seems like a bad idea. Especially since my profile is not exactly anonymous.

Any Facebook employees want to get me a green light? :)


...do the credentials still work? :)


The db was most likely never listening for connections from the world at large anyway.


If their code was so easily leaked, it's not out of the realm of possibility...


Reads like: my dad works for Sony and I have a Playstation 5 in my possession.


I got the exact same thing on the graph PHP page. It had database credentials, but they we're canceled out and replaced by some other constants.


When I see code like this, I'm always amazed that I can actually read it and (kinda) understand what it's doing. I always expect code like Facebook's to be so finely tuned and advanced that it'd be completely uninteligable to those outside the company and not an expert in the language.


Actually, a large codebase can be reasonably expected to live 20 years. Over that time with a lowish 10% turnover you'll replace the entire programming staff twice, so the understanding people have of the code is not just hearsay, but hearsay of hearsay. The sheer size of the codebase (facebook has tens of millions of lines of code) means that practically speaking you can't program your way out of a corner once you get there. That is (partly) what killed facebook's competitors. The solution to all of this is to be ruthless about simplifying code. The goal of new code should first and foremost be to be maintainable. Clever code is the enemy of scaling a programming team. Not that you don't need clever code, but you isolate it and protect it and make sure the average team member doesn't have to look at it.


In my experience code become unitelligable not because it's fine tuned and advanced but because it's messy and rushed.


Or over-generalized with the logic split up and hidden in the interactions between a dozen (sub)classes.


Or under-generalised with the logic duplicated in a dozen places, ever-so-slightly differently


Doesn't quite hold true once you get done to hand written and optimised binaries though. There's no way of making assembly easily readable to everyone, no matter how relaxed the developer.


Hand-written assembly can be just as readable as any other language. See https://github.com/jmechner/Prince-of-Persia-Apple-II/blob/m... for a good example!


You shouldn't be. Ultra minimalist code using every language feature tends to be difficult to understand and maintain. It's usually written by young coders eager to show how clever they are.


It's the exact opposite. The better code gets quality wise, the easier it gets to read and understand.

There are situations that require using somewhat surprising language features occasionally, but quality code will limit them to small areas, and document the hell out of what's going on there.


The best code, is often the easiest code to understand.


I think there is a valuable lesson to be learned from this piece of spaghetti. I can't quite formulate it from the top of my head. But it's something like: if you wanna be rich, don't waste your time being pedant - your users couldn't care less.


This piece of code is doing something very simple and sequential in nature: putting together the (then) Facebook front page. I'm not sure I agree it's spaghetti. Are there any complex relationships that become hard to follow because of an inadequate level of abstraction? It doesn't look that way to me.

Many programming methodologies are proposed these days, but the entire "field" smells of pseudo-science. Unless studies are done that can show statistically significant differences in relevant metrics (defect density, time required to add a feature, etc.), it's just a matter of opinion.


It's kind of funny to see two reactions here:

* Reaction A: code is ugly, what a bunch of jerks!

* Reaction B strikes me a bit as hero worship. Since we already know the outcomes, already some consider them geniuses; we must conclude every decision they made was a good one, and there is no room for criticism.

Perhaps neither are great, but I think reaction B especially is a little dangerous. One must acknowledge that there are more factors to your success than coding style, but that doesn't mean if you start neglecting it then it will also lead to your success.


A false dichotomy if ever there was one. Anyone who doesn't think the code is ugly must just be blinded by hero worship?

How about the possibility that a1a is one of the people who just automatically equates "large volume of code" with "spaghetti"? In my experience there is a fair number of HN users (and devs in general) with this knee-jerk perspective.

In my very humble opinion any reasonable person who takes a few minutes to actually read through this code would never call it spaghetti. _Especially_ considering 1) the feature set of PHP at the time the code was leaked, 2) the immense scale of Facebook even at that time.


It's almost like you are trying to misinterpret my post. Large volume of code? The site displays roughly 600 lines of code, is this a large volume of code to you?

Like I posted - just below - a day before your post: "My post was not meant to trash the code, but rather to point out that this code shows us that the primary objective shouldn't necessarily be writing the perfect code."


You make a fair point and I guess everything is relative. My post was not meant to trash the code, but rather to point out that this code shows us that the primary objective shouldn't necessarily be writing the perfect code.


> My post was not meant to trash the code, but rather to point out that this code shows us that the primary objective shouldn't necessarily be writing the perfect code.

That's a dangerous argument. I mean, if startups stopped pursuing the most elegant code, they wouldn't need, nor would they be able to hire, the "best" developers. And then what?

1. Lots of startups would be able to get off the ground and grow with less cash because they wouldn't have to pay six figures to every developer on staff.

2. It would be a lot easier to find acceptable candidates (no more "you have five minutes to implement sort on a whiteboard...to prove that you can build a web form" challenges during interviews).

3. There would be a lot less premature optimization and use of the framework or NoSQL database du jour.

4. Egos would be hurt as engineers with the most impressive educational backgrounds and deepest technical expertise would be forced to come to grips with the fact that, while they are valuable and have a big role to play at some companies, people with less formal training and knowledge can build working, commercially-viable CRUD applications.

This would be the end of the Silicon Valley startup world as it exists today, I tell you!


I was halfway through my rebuttal of your comment before I realized it was satire. Well done. :)


Oops sorry. Ok, we agree!


My mantra:

Shipped code > Well architected incomplete features.

Your user does not care in the slightest if you're using a design pattern, or if you are using dependancy injection, or if there is 100% code coverage. Just make it work! Then make it faster! Then make it more readable! In that order.


You might ship faster but this can easily lead to poorly written, hard to maintain and insecure spaghetti code. In fact, rushing to ship / meet deadlines is probably responsible for most of the vulnerabilities in software.


Bingo. Do it right the first time. I'd rather take an extra hour on a bit of code the first time then go back and spend 2 hours refactoring it later on.


Ship too late and none of it will matter.


I guess shipping is more important to you than the possibility of losing user details (or worse). Christ, I hope I never give my details to a company you found.

Shipping quickly is important but it's also important to write quality code. Small bugs that can easily be fixed are fine but security problems or bugs related to payments, for example, are not.


so, maybe, just maybe, you take a bit more time on parts involving security (that is to say, handling of user credientials (includes session management, cookies, etc) and payment related things)?


How many companies have failed because of security flaws in their code?


Companies don't normally fail because of security flaws, in the same way Boeing doesn't go bust when it has to ground 787s. But in both cases you end up potentially taking a huge hit in costs. Off the top of my head, Sony had to write down $170 million in costs when PSN was compromised, and TJ Maxx ended up paying out $800 million in costs, damages, and compensation after their payment terminals leaked credit card details.

These are not figures you want to see on your bottom line.


If your first reaction when someone talks about security flaws in payments is 'will it make my business fail' rather than 'is this going to fuck my customers' you need to re-evaluate your priorities.


This is a straw man, right? Not all bugs are payment security bugs. Not all bugs are harmful to users. And spending more time writing cleaner code does not mean you'll have fewer bugs.


> In fact, rushing to ship / meet deadlines is probably responsible for most of the vulnerabilities in software.

I think the unspoken secret here is the probability*loss for security issues is far less than the cost of missing features / delays.


That approach only works in the short-term though. When you've got an already large code-base you just 'made work' but with no tests (code-coverage), how do you 'make it faster' or 'make it readable' without breaking things? Especially as you are likely under pressure to be adding new features.


By being careful?

I'm all for tests but tests aren't the ONLY way to write software that you can modify without "breaking things." It does require more time to develop and test, but then again, you're saving some time by not writing and maintaing tests.

Again, I believe in automated testing because I think it hedges my risk but I'd caution against believing your own hype that there is only one way to do something..


>I'm all for tests but tests aren't the ONLY way to write software that you can modify without "breaking things."

What are some other methodologies that can let me change code with the confidence well-written tests give me? Would love to be able to employ them when automated tests aren't feasible.


I can't speak to your subjective question but there's a lot of software without automated testing that people are completely capable of modifying without breaking. Ask me a real question and not one about the confidence YOU get from something and I'll see if i can't answer it.


I believe there is a rather apt quote by someone whose name escapes me right now (C.J.Hoare maybe?). It goes (paraphrasing): "There are two ways to write software. One is to write code that contains no obvious deficiencies, and the second is to write code that obviously contains no deficiencies. The second way is much harder"

I always took this as an argument for smaller, composable components that value simplicity over complexity, dumb over smart and composable over monolithic. I personally (not quite there yet!) try to write code in this manner. I find that the more straight forward and simple the code, the less need there is for tests because I can hold the whole of the code base in my head.


Shipped code > Well architected incomplete features.

Actually there is more wisdom hidden in this dogma. I'd argue you can't really well architect anything without iterating running code. So, it's more of a cycle. At least in my case. Proto, use, profile, refactor, proto new stuff into it, use, profile, refactor...


This cycle usually ends too early though, sometimes at the first iteration, and the code will be pushed to production. Usually the spec is lacking and the customer (internal or external) doesn't really know what he wants. After that, feature requests and bug fix hacks will deteriorate the condition of the code base to the point where things break if you look at it. Rewriting will be difficult because instead of formalising the requirements in a specification, with all the hacks and features added the existing code base is the spec. So you end up with something very fragile and not so agile. At the bottom lies a first iteration model which the original developer (who left the company a couple of years ago), have been given some time to reflect on things, now knows to be a model which can't possibly scale.

Sometimes (not always), it's best to release when ready. Shipping is important for a company, but for a dev team/individual it should be further down on the list and more importantly, development is not done after the first release and features don't maintain themselves.

I don't like the whole 'ship as soon as possible'-thing.


Yup.

Shipping is a feature.


I'm pleasantly surprised. It's actually fairly well structured in my opinion.


Are you a PHP programmer?


I don't know. I can program in PHP. Does that make me a PHP programmer?


LOL:-) I know the feeling.

Someone recently mocked a full-stack-developer on HN as "I can javascript and servers too..". Oh man, how it made me cringe...In another 10 years, may be I won't have to. Off to dreaming...and coding...where is google....and stackoverflow of course...


I'm a PHP programmer, but I haven't done much in terms of procedural code in a long time, I'm more or less OOP all the way using modern libs like Composer... This code although not bad per say, does make me twitch because of how it is.


Remember that evolution article the other day? It's not "survival of the fittest". It's "survival of the fit enough".


all of computer science was created by an undocumented spaghetti code that evolved over a few hundred million years: DNA is 700 MB or so of uncompressed undocumented base pairs.

We seem to be doing all right.


Yeah, and because the code is completely undocumented and obfuscated it's taking us biologists a few hundred thousand man-years to figure out what the hell it means. The only time spaghetti is a good thing is when it's covered in sauce.


It'd be somewhat weird to stumble across comments in that code.

// TODO:


I quite often see words (or nearly-words) in protein sequences.

The best I've seen are (when reading DNA translated in all frames):

EVQLVE

LAMARCK

ELVISISGAY

SALTYSATAN


Is SALTYSATAN a metal band or a sex act?



... unless your users actually care about messages being delivered, whatever being reported as saved being truly saved, et cetera...

There is a whole world out there where people want to rely on software to do what it says it does. I know Facebook can live in its own bubble and get away with every possible stupid bug a messy PHP spaghetti causes.


I remember listening to an interview with Markus from Plenty of Fish, where he essentially said that he didn't worry too much about site errors because most unsophisticated users would attribute them to things like their ISP, browser (if they knew what that was), or their own error more often than to the site itself.

Personally, I can't bring myself to not care like that, but it seems to have worked pretty well in the early days of many now-popular sites. Especially in 2007, when Facebook was still in real competition with MySpace, moving as quickly as possible was probably much more important than a few messages dropping through the cracks.


I must admit my comment was unneccessarily bitter, but my observation is completely different. Many friends and family members are now seeing data loss on Facebook weekly at least. I wouldn't call them tech savvy either, yet they can attribute the issue to Facebook, after having learnt the simplest basics of how the web should actually work.


That's probably by design. Would you prefer one unimportant message lost here and there, or be able to handle 1/1000 of the current traffic to make sure every single message is delivered?


This is exactly the problem. You do not know the message is not important. In fact, every message is important to someone.


In your first comment you implied that it was a problem with PHP. I told you they are doing it on purpose, and not because their code or language is bad, and you now agreed. So I don't know where you are trying to get with this discussion. They have to choose between maximum performance and perfect consistency. They can't have both. So what they are doing is saving money in infrastructure, and letting some messages get lost from time to time. Your family being a little annoyed made them money.

Also, when we say "important" in this context, is almost like asking "are you willing to pay for it?". Most people wouldn't pay a dime for ensuring the consistency of all their Facebook messages. So Facebook chose the right option.


Actually the trade off is not between consistency and performance, but scalability and performance (CAP theorem). Although related, they are not the same concepts.

My opinion is that you can have both performance and consistency, and work your way toward scalability as required. I recall that Facebook has a very high server-per-engineer ratio (though I acknowledge that the user-per-engineer ratio is even higher in comparison to other startups).

I also realize it is not cost-efficient to write bugfree software, but saying they did it the other way on purpose is forgiving them too much. You never write bad software intentionally.

They didn't care, and the world should have relied and should continue to rely on better software.


I know the CAP theorem, but that's not what is going on here. And now again you are saying that the data loss is a bug. It's not! Like I said, it's by design, they are probably using a fire-and-forget methodology for writing the comments. Do you seriously think there is a bug in PHP or their PHP code that is making a very little percentage of comments get lost? Besides, they have moved away from PHP long time ago for their backend. They are using a combination of Java, Scala, etc. Software that loses some data instead of crashing when under heavy load is not bad software. It has its uses, eg: massive amount (per unit of time) of unimportant messages that no one is willing to pay for.


The code looks pretty clean. I dig the two-tab spacing as well, but perhaps that was done after the fact.

Anyhow, not sure if this makes any difference or not but i'm curious as to why true PHP constants are not used, and instead regular variables in all caps like $PARAM_INT are used. Anyone know why this might be?

I ask because one of you PHP-gurus might inform me that there are certain use-cases where a true constant is not wise.

An example below:

  param_get_slashed(array(
    'feeduser' = > $PARAM_INT, //debug: gets feed for user here
    'err' = > $PARAM_STRING, // returning from a failed entry on an orientation form
    'error' = > $PARAM_STRING, // an error can also be here because the profile photo upload code is crazy
    'ret' = > $PARAM_INT, 'success' = > $PARAM_INT, // successful profile picture upload
    'jn' = > $PARAM_INT, // joined a network for orientation
    'np' = > $PARAM_INT, // network pending (for work/address network)
    'me' = > $PARAM_STRING, // mobile error
    'mr' = > $PARAM_EXISTS, // force mobile reg view
    'mobile' = > $PARAM_EXISTS, // mobile confirmation code sent
    'jif' = > $PARAM_EXISTS, // just imported friends
    'ied' = > $PARAM_STRING, // import email domain
    'o' = > $PARAM_EXISTS, // first time orientation, passed on confirm
    'verified' = > $PARAM_EXISTS)); // verified mobile phone


They might be simple input validation filters created using the create_function() function which allowed you to create anonymous functions of sorts in PHP prior to 5.3. You can't assign an anonymous function to a constant. An alternative would have been to include a big switch statement in the param_get_slashed() function but since the same validators are used in a bunch of places it seems cleaner to use anonymous functions and have each function that uses them loop through and call each function passing the parameter named in the array key in as an argument


I think it might be because if you write something like $PARAM_INT_sometypo that variable would be undefined, and it is easy to catch. But if you are using constants PARAM_INT_typo becomes a string[1]. That way you would have to do more validation about that, too.

But without more context it's not that easy to tell.

[1]: http://us.php.net/manual/en/language.constants.syntax.php (search for "undefined constant")


I'm not a PHP-guru, but it's entirely possible that they're playing around with some background configurations that would make performance tweaks in some sections of the code difficult if they're defined variables.

I guess this is an implementation of a semi-defined variable. It should probably be noted that there may be a tiny bit of performance improvement when calling isset instead of defined when checking for variable set. I'm not sure if that's one of the considerations since it would definitely be a case of premature (and ultimately pointless) optimization anywhere else, but since it's Facebook, they may have gone that far to squeeze every bit out of it they can.


true PHP constants was a performance hit.


define()'s were expensive in PHP


Hehe, these lines made me chuckle.

  // Holy shit, is this the cleanest fucking frontend file you've ever seen?!
  ubersearch($_GET, $embedded = false, $template = true);
In all seriousness though, I wonder how much of this was written by Mark Zuckerberg?


With the header "@author Mark Slee" on that file, I doubt much if any.


I don't know why, but I always love comments more than code itself.


This is quite neat compared to the average shitfest people leave inside wordpress templates. I had to unpick a whole 200 page intranet written with wordpress and buddypress (which stinks) recently and rewrite it as a non-wordpress site as it's hit the inevitable hack brick wall.

It's now an asp.net mvc app on Azure. There's hardly any code in it now and a the page response times are down at 50ms rather than 1500ms! One query per page vs 400 as well.


Exaggeration is the mother of ... well, something.

An entire intranet functionality with one SQL query per page. Right, okay...


I don't think it looks that bad. It's clean and easy to understand.


Anyone who calls this bad/spaghetti code hasn't seen bad code.


The comments on github are akin to 'my eyes! they bleed', but tbh, this is actually quite readable and I imagine wouldn't be all that hard to maintain.

I haven't touched PHP since about 2007 and it seems like it would be pretty easy to jump in and start making changes and edits. It is well templated, seems to use variables well and not have too many (if any, didn't look all that closely) hard-coded values. I've seen much, much worse in "cleaner" languages than this.

As a side note. I learned a lesson in my early 20s that has served me well to today.

Shipped code always wins.

I was a huge proponent (and still am) of good code, well architected blah blah blah. But, in the end, if your code never sees the light of day, it doesn't matter. I saw this when I took over a team responsible for care and feeding of a product originally written by the founder of a company. This company had just secured a series B on the original code (still). The product was ugly, the code was terrible, it was slow, it wouldn't scale, couldn't be configured and really couldn't be maintained. However, it got the company moving, got a series A & B, secured the first customer, and a second. It basically created a runway to build the thing right that wouldn't have been available had it not gone live.


I like the variable naming in the code. It's good that they didn't use one/two letter variables that span more than five lines of code. Short, meaningless variables names--used for things other than index values is one of my big pet peeves. I had to use somebody else's code as the base for a project and it was full of two letter variables that didn't mean anything. Gave me a big headache working on it.


Seriously, some of these variables are named so elegantly it's actually a little surprising.

Beginning developers could learn a thing or two looking over code like these (even though it's now clearly outdated and likely defunct). I've been programming for half a decade and my naming conventions are still horrid.


I don't remember from the site at the time what "monetization_box.php" would have been doing.

Also, my style of comments seem so tame and lame now :)


This looks weird in search.php (line 89):

  if($user 0 && !is_unregistered($user)) { return $user; }


I'd hazard a guess that should be:

  $user <> 0
Which, of course, would not display if someone had originally pasted this code straight into an HTML page without escaping it.

EDIT: Looks like it would show up. I wonder, then, if the <>s has been stripped. Some poorly written websites do that.


This is either a syntax error or a HipHop idiosyncrasy (was HHVM in use in 2007?)

$ php -r '$user = null; if ($user 0) { echo $user; }' PHP Parse error: syntax error, unexpected '0' (T_LNUMBER) in Command line code on line 1


I don't think even HPHPc was around in 2007. Development didn't begin until 2008, IIRC.


line 72: // Holy shit, is this the cleanest fucking frontend file you've ever seen?!


My favorite line, too.


There seems to be, here & on the "interverse", a battle on whether it will be more more hip to call this spaghetti code or to call it clean code. I don't see very much debate on whether or not it is GOOD code.

It isn't good code. It is cleanly written, especially for the time. But all the noise about how bad it is isn't "hipster" talk - this is 400+ lines of completely unmaintainable code; which is why it isn't part of the current code base.

As soon as they got some money & started trying to make it do more, they wised up and dumped it


People just call this spaghetti for superficial reasons, they don't see objects and they don't see current PHP practices used in 2013. But it is readable and maintainable code. We're out of context, and there is nothing there to assume lack of quality. Some people mistake code with literature, objectively speaking this is not bad code.


From linked article "It seems that the cause was apache and mod_php sending back un-interpreted source code as opposed to output, due to either a server misconfiguration or high load (this is a known issue)."

Does anyone know what he is referring to when he says this can happen via high load?


It is a myth. High load has nothing to do with it. However if you configure apache wrongly, it will serve .php files as text. Only connection to load is if you have one broken server among N proper ones, a number of times that the broken one is hit depends on load - I.e. on low load, it may be configured so that it is never hit at all.

I guess the story about apache serving php source under high load came from the idea tha Facebook is a high load site (true) and they couldn't just have made as obvious screwup as misconfiguring a production server (false) so it must be apache/php bug that happens only in high load sites (false)


this was a real issue, here is the bug:

https://bugs.php.net/bug.php?id=26810

there were many related bugs as well. it didn't get fixed for a while, and when it was fixed it was over a number of revisions and wasn't tagged.


It was a rare bug in the 4.2.x and 4.3.x branches of the Apache module running on 2.0.x.

The real way this code was leaked was via a plain text version of the file being stored on the server that the hacker found by trying a bunch of URLs

I was speculating at the time, and had recent experience with the bug where source code would be exposed. I found out a little later how it was actually leaked, but then forgot all about it.


How did someone "expose the PHP source code"? Did they actually find a way to make the code show up client-side, or was it just someone who managed to get access to the backend stuff? The way it's worded makes it sound like the former, but that seems unlikely...


It can happen if you upgrade Apache and/or mod_php, and for some reason mod_php fails to register for .php files without being a fatal startup error. Then apache will serve the .php files as plaintext.

It's one of the things that make one-off .php scripts easy to deploy (just chuck it into the webroot). With a little effort, this can be mostly prevented: let the webroot .php file just do an ' include("../outside-webroot/actual-script.php"); '


They misconfiguring apache to serve php as text file instead of being processed by php engine. One missing line in apache config can do that. It didn't last long, but of course people saved every bit that was leaked this way. Which is mostly entry points of course.


Another possibility is some quirk in the file permissions, .htaccess or some other configuration file, which could lead to vulnerabilities like HTTP verb tampering.

It may have been more subtle, but of a similar nature.


Sounds like it was an "accident". The web server served up the source code rather than compiling the PHP, which someone then saved.


I think it was a glitch in Apache that made it spill out the sourcecode


Pre-PHP 5.3 -> Right off the bat: no namespace support, no closures, no surprises there

I am a bit surprised about the lack of OO though...


This is why I was saying evented webserver code is better.

Look at how much I/O is happening one after the other. The latency could be greatly reduced by doing things in paralel and waiting until all the promises resolve.


index.php line 130: foreach($upcoming_events as $event_id = > $data) {

Space between "=>"...


A startup with spaghetti code of this stature will NEVER be successful.


I really hope they cleaned that mess up or I feel very sorry for all the developers at Facebook.

>ini_set('memory_limit', '100M'); // to be safe we are increasing the memory limit for search

>tpl_set('simple_orientation_first_login', $get_o); // unused right now

>// We special case the network not recognized error here, because affil_retval_msg is retarded.

>all those undocumented(?) random error codes

>mix between ternary operators and regular if/else statements with no logical choice between one or the other

>no auto loading whatsoever

Seeing as this is from 2007, there is hope.


What's wrong with increasing the memory limit?


A 100MB per request is going to seriously limit scalability (or inflate hardware costs at least)


It would seem that Facebook has been able to scale. Perhaps this line of code worked for them in 2007, and they changed it only when it started to be a problem.


It's probably still there and has probably never caused any problems. In other words I think iand is exaggerating.


It is certainly not still there given that facebook doesn't run PHP any more. They wrote a compiler to translate a subset of PHP to C++, which they then compiled into a massive executable. The compiler is open source and called hiphop, and it does not implement the PHP memory limits.


By that logic, no one runs any programming languages, since they all get translated to x64 or other instruction sets.


That makes no sense at all. The function in question sets the memory limit for the PHP interpreter. A specific piece of software. Facebook does not use that specific piece of software.


Your comment was ambiguous then. It sounded like your logic was "they don't run PHP because they wrote their own implementation"... which is still running PHP. I think you meant to say they're not running the normal PHP interpreter, so that particular functional call might not be implemented (but it could be, still...).


Facebook still uses PHP bro...


They still write code in the language PHP. They do not run that code using PHP the software you download from php.net. The memory limits of the PHP interpreter from php.net are not relevant at all to facebook's current operations.


They don't do that anymore. Ask your Facebook friends, hphp has changed since then.


I know, I used past tense. But the initial hiphop work was when php memory limits stopped being relevant for them.


It doesn't say it's 100MB per request, just that it's raised for safety in the rare case where it's actually needed. Would you prefer serving a 500 to that user instead?


I can't think of a single reason a PHP script like this would need that much memory to serve a single user. Everything big would be handled by the database, what could 500MB possibly be used for?


They would be applying some search logic in memory in PHP. Depending on what you are doing it does make sense to load in a heap of data from the database then narrow it down in PHP.


That sounds grossly ineffective.

Reminds me of some production PHP I saw once, returned the entire database (`SELECT * from 'db'`) and manually walked through it with a for() loop. It wasn't very efficient at scale.


Yeah, you're right... It's way better to do expensive joins on the database where horizontal scalability is just a fantasy than in the shared-nothing www frontends that can be duplicated in clusters ad infinitum.

If you're building SomeSimpleBasicSite.com and you do this, you're probably a bad programmer. If you're truly facing scalability problems, it's table stakes.

Based on your comments in this thread you're out of your depth, man.


I still can't imagine not find an example of where anybody would want to return 500MB of results to PHP for processing.


I think Kiro meant 500, as in the HTTP status code, due to PHP running out of memory, not 500MB.


Guess what happens when a lot of people search.


Yes


That's an abnormally high memory limit for what should be a simple request.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: