Well, the customizations are to the C++ code and it doesn't feel particularly difficult to adjust, but then we have years of senior-level C++ coding behind us.
One of the first changes we made to sphinx was just to increase the minimum word length so that queries could be extremely long (as error messages sometimes are quite long). This was changing a define and tweaking some other areas internally.
We also made some changes recommended by others in the sphinx forums and have been adjusting the weighting algorithms for our own needs.
No plan, really. The changes would need to be #ifdef'd or otherwise conditioned-out because almost no one would want them. We optimized it specifically for our own error message searching and that's not really very useful for practical use.
For adjusting the maximum query length, you can find tips on that in the sphinx forums.
The most interesting thing of their usage is how they managed to create almost realtime search on Craigslist by using Sphinx's delta indexes. And they are doing this on lots of data, which is a good sign.
Sphinx 1.x+ should feature realtime index updates, which will make the Sphinx deal a lot more impressive.
This all said, the support of realtime index updates in the current search engines is a joke and one must do lots of hacks in order to support them properly and on lots of data/updates.
At Hubpages.Com, we switched to sphinx in December. We found for PHP that the migration was pretty painless. We had previously used the MySQL full text search.
What!? Classifieds have been around for forever and were the bread and butter of the print newspapers for a long time. If craigslist "took on" anything, it's classifieds.
The "second-hand random stuff" market that ebay had previously was just made easier (and cheaper) by craigslist, but only for local items.
The thing I'm pointing out, is that given all the online newspapers, and startups probably to do it, I'm surprised Craigslist could take off.
You make it sound like craigslist is some recent newcomer to the internet classified scene. whois craigslist.com shows "Record created on 24-Sep-1997". Craigslist was most likely the first to do and popularize on-line classifies, and figured out all the tech and social changes to do it. Online newspapers never have concentrated on on-line classified and are just now doing that in order to have a decent on-line offering that keeps people coming back. There are even whitebox classified engines that companies like radio stations can stick on their websites to encourage traffic. None of them will beat craigslist though.
My second point about it starting as SF is: In 2005 or so, when I read about CL, the site was only SF, and was pretty small.
I'm sure craigslist has had city listings for places other than San Francisco since before 2005 (maybe not too long before). But San Francisco is a good place to build up a userbase of on-line users because the culture is welcoming to on-line interaction and everyone is so wired (at least compared to other cities).
Some choice quotes:
"25 MySQL Boxes to 10 Sphinx"
"50M queries per day w/steady growth"
"1,000+ qps during peak w/room to grow"
It's great to see it standing-up well, though it looks like they made (or sought) patches for issues they ran into.