Given the rise of NoSQL solutions and the amazing lack of query runtime optimiza...

mcantor · on Dec 13, 2010

There's a yawning chasm of difference between "a query that gets the right data in the right way, but does it somewhat inefficiently," and "a query that gets the right data in a stupendously roundabout way." For the former, query optimization is a reasonable salve. For the latter, no amount of automated optimization can save you from crippling performance problems.

Look at it this way: If someone comes to you and says, "I want to get from New York to California in 2 hours, so I'm going to take a 747." Off the top of your head, you could tell them, "I don't think you can do that without a Concord." A computer could easily optimize that plan.

But if someone says, "I want to get from New York to California in 2 hours, so I'm going to stick grapes in my nostrils and soak my shoes in canola oil," there's nothing you can do to help this person, because they clearly have no idea where to even start. A computer could do no better.

Using or instead of in() is taking a 747 instead of a Concord. But many of the points in that article, like using distinct to mask a lack of understanding about how you're asking for data, are very much the grapes & canola-oil-Nike's of SQL.

EDIT: Whoops. Sorry; since you had been downvoted, I misread the gray-on-light-gray text as "Given the amazing amount of run-time query optimization performed by most relational DBs." Guess I should have run my comment through an HN Post Optimizer.

NoSQL solutions are not a drop-in replacement for relational databases, either, though. They're different tools for different jobs.

No matter what, you're going to need to look at the same data in different ways. If that isn't true, then the decision is easy. But in pretty much every non-trivial application I've ever worked with, the source information and the requirements for the resulting data format have never been the same for every feature.

When you decide to go with NoSQL, you are implicitly accepting some of the burden previously handled by the RDBMS: The onus is now on you to do your own "indexing," your own "views," your own "normalization." I only have experience with CouchDB, which provides an excellent array of tools for handling these responsibilities, but for certain types of data, there is still no substitute for an RDBMS.

arethuza · on Dec 13, 2010

The more I use CouchDB the more I see it as something that complements "traditional" relational database features.

I'm actually rather hoping that someone produces a hybrid NoSQL-SQL system that gives the best of both worlds.

Nitramp · on Dec 13, 2010

"amazing lack of runtime optimization"? Do you have anything to back that up?

Most "old" RBDMSes (i.e., not MySQL) are actually ok to quite good at optimizing queries. Sure, you will always find things that are not optimized properly, but having worked on a query optimizer myself, this is actually much harder than you might think it is. Not only coming up with optimizations, but also making sure they are legal, and don't degrade certain queries. There's nothing worse than optimizing a legit query to pieces...