Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
MySQL and Sphinx at Craigslist (presentation by Jeremy Zawodny) (percona.com)
33 points by amix on April 27, 2009 | hide | past | favorite | 12 comments


I love sphinx (we use a custom version of it to power http://bug.gd).

Some choice quotes:

"25 MySQL Boxes to 10 Sphinx"

"50M queries per day w/steady growth"

"1,000+ qps during peak w/room to grow"

It's great to see it standing-up well, though it looks like they made (or sought) patches for issues they ran into.


How hard or easy was it to customize? Would you mind sharing the nature of your customizations?


Well, the customizations are to the C++ code and it doesn't feel particularly difficult to adjust, but then we have years of senior-level C++ coding behind us.

One of the first changes we made to sphinx was just to increase the minimum word length so that queries could be extremely long (as error messages sometimes are quite long). This was changing a define and tweaking some other areas internally.

We also made some changes recommended by others in the sphinx forums and have been adjusting the weighting algorithms for our own needs.


Did/Will you make your patches open source?


No plan, really. The changes would need to be #ifdef'd or otherwise conditioned-out because almost no one would want them. We optimized it specifically for our own error message searching and that's not really very useful for practical use.

For adjusting the maximum query length, you can find tips on that in the sphinx forums.


The most interesting thing of their usage is how they managed to create almost realtime search on Craigslist by using Sphinx's delta indexes. And they are doing this on lots of data, which is a good sign.

Sphinx 1.x+ should feature realtime index updates, which will make the Sphinx deal a lot more impressive.

This all said, the support of realtime index updates in the current search engines is a joke and one must do lots of hacks in order to support them properly and on lots of data/updates.


At Hubpages.Com, we switched to sphinx in December. We found for PHP that the migration was pretty painless. We had previously used the MySQL full text search.

Check out the tutorial here for a nice overview of what's involved in setting up sphinx: http://www.ibm.com/developerworks/library/os-php-sphinxsearc...



I never understood Craigslist. The most baffling thing is they took on EBay and won, which I put down to luck.

But the initial concept of a listing for a small town (whether it be San Francisco or not) is laughable.


What!? Classifieds have been around for forever and were the bread and butter of the print newspapers for a long time. If craigslist "took on" anything, it's classifieds.

The "second-hand random stuff" market that ebay had previously was just made easier (and cheaper) by craigslist, but only for local items.


Sorry, that was a bit flamey of me.

I have read Craig's interview in Founders at Work, though.

The thing I'm pointing out, is that given all the online newspapers, and startups probably to do it, I'm surprised Craigslist could take off.

My second point about it starting as SF is: In 2005 or so, when I read about CL, the site was only SF, and was pretty small.

So I reasoned: It would be meaningless to people outside SF (eg, no listings) so how could it eventually become a multi-million dollar business?

Web business is really wierd...


The thing I'm pointing out, is that given all the online newspapers, and startups probably to do it, I'm surprised Craigslist could take off.

You make it sound like craigslist is some recent newcomer to the internet classified scene. whois craigslist.com shows "Record created on 24-Sep-1997". Craigslist was most likely the first to do and popularize on-line classifies, and figured out all the tech and social changes to do it. Online newspapers never have concentrated on on-line classified and are just now doing that in order to have a decent on-line offering that keeps people coming back. There are even whitebox classified engines that companies like radio stations can stick on their websites to encourage traffic. None of them will beat craigslist though.

My second point about it starting as SF is: In 2005 or so, when I read about CL, the site was only SF, and was pretty small.

I'm sure craigslist has had city listings for places other than San Francisco since before 2005 (maybe not too long before). But San Francisco is a good place to build up a userbase of on-line users because the culture is welcoming to on-line interaction and everyone is so wired (at least compared to other cities).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: