A Tale of Query Optimization

malisper · on May 19, 2019

I can definitely relate to the struggle of debugging slow SQL queries. I've been optimizing Postgres for the last three years for my day job. There's nothing like spending hours trying to determine why a query is slow, only to discover some bizarre incantation makes the problem go away.

I had one case where adding OFFSET 0 to a query, which did not change the semantics of the query, made the query ~10x faster. It turned out that due to a reference to a large column being inlined in a dozen different places, Postgres was detoasting[0] that column a dozen times more than it needed to. Adding the OFFSET 0 prevented the subquery from being inlined and made it so Postgres only detoasted the column once.

In general, I think optimizing a database is way harder than it needs to be. A lot of it is black magic and it requires a ton of expertise that many teams don't have. Even though I've been doing it for years, I still struggle with figuring out what indexes I should create to make a query fast.

I recently started a startup in an attempt to make optimizing databases easier. I built a Postgres extension that records information about queries as they run. It's basically the same information you would get for EXPLAIN ANALYZE, but you get it for every query at query time. Then, using this information, I'm able to make specific recommendations about how you should optimize your database. I can give advice on how you should structure your schema and what indexes you should create in order to maximize the performance of your queries. This way, even teams that don't have much in-house Postgres experience won't have to spend hours or even days trying to figure out how to make their queries faster.

If this sounds interesting to you, I would be happy to chat. You can reach out to me at michael@perfalytics.com. It's my goal to use the expertise I've built up over the last couple of years to make scaling Postgres a truly painless experience.

[0] https://malisper.me/postgres-toast/

ChrisSD · on May 20, 2019

Isn't there some irony in fact that SQL was meant to abstract away the "how" of getting records? If you need to optimize through implementation specific quirks, then wouldn't a more direct api be preferable?

malisper · on May 20, 2019

I have a number of thoughts on this. A common claim I've heard is that query optimizers are better at producing efficient query plans than all but the best programmers. I can't find any concrete studies on this, but it's plausible. Databases are great at handling slight variations of a query and a data set. I've had times where I thought Postgres was going to execute a query one way, but it wound up executing it in a different way that happened to be more efficient.

Given that query optimizers are usually better than humans, it makes sense to use them for the general case. The question then becomes what happens for the exceptional cases. Cases where the database chooses a bad plan that a human could tell is obviously going to be slow. For these cases I think it's up to the database implementors to decide what to do.

Some databases have support for query hints, a way to force your DB to do what you want. The Postgres team has avoided implementing query hints for two reasons:

- In the long term queries that make use of query hints will be suboptimal. (Personally I don't buy this. I've ran across cases in Postgres where the planner will only give you a very inefficient query. I would rather use query hints to get the query to work for the time being than never have a working query at all.) - Implementing query hints would overly complicate the planner. I find this to be pretty reasonable.

As usual, having a query optimizer is a tradeoff. It seems to me like in most cases it's better to have one than to not.

CaptainZapp · on May 20, 2019

I totally agree with your assesments. There's just one more flavor, which I'd like to add to the mix:

That's when the underlying data changes. Optimizers (on which I agree with you are usually much better than any developer / DBA in assessing the optimal path to execute a query). They do rely on statistcs, however, which are not automatically updated.

This is also reasonable since it would be much too expensive to run real time updates on statistics, while the database is in use. A heavily transaction oriented system (which are still a lot of relational database implementations) would be killed by such a requirement.

If those indexes are stale, however, this can totally kill the optimizer's ability to determine a good plan.

I've seen very, very few examples when a optimized plan sucks, when it's based on correct statistics. And I agree that hints are a a good way to fix this. I'd make this conditional that they always need to be reviewed with a new version of the underlying database engine.

petergeoghegan · on May 20, 2019

Even if you assume that the human programmer is better than the query optimizer, which I think you'll agree is assuming rather a lot, the query optimizer still has important advantages. It can react to changing conditions.

The optimal query plan for a given query can change as data is added or deleted. Moreover, there can be multiple optimal query plans at the same time for "the same query" (or at least the same parameterized SQL statement).

I'm not suggesting that optimizers don't produce bad plans that have to be worked around. My point is that comparing the optimizer to an expert human only makes sense as a thought experiment. Most bad query plans are caused by bad selectivity estimates/stats, not bad cost models.

ChrisSD · on May 20, 2019

This is starting to sound a lot like the discussions around optimizing C compilers. Only in that case you can drop into assembly if necessary.

Shorel · on May 20, 2019

All abstractions can be a bit leaky. That doesn't make them useless.

xemdetia · on May 20, 2019

I would disagree with this just from the fact that writing data durable code as well as all of the other core ACID things are too error prone. In my perspective I never felt that it was really there to to faster but was more there to solve the other problems. If it was truly faster people would have stopped using things like ISAM, but they do not since the characteristics are better. There will be overhead and quirks with SQL but an equivalent more direct system would have different quirks perhaps in a different area.

brian_spiering · on May 20, 2019

Optimizing SQL appears to be a reasonable task for Reinforcement Learning. Given it is sequential with clearly defined actions, states, and rewards.

There has been work already done, for example https://blog.acolyer.org/2019/01/18/towards-a-hands-free-que.... But it feels like the early days.

brightball · on May 20, 2019

Thoughts on Dexter?

malisper · on May 20, 2019

I like what it's trying to do, but I disagree with the approach. I have a few issues with the implementation:

#1 Dexter relies solely on the Postgres cost model.

Dexter works by using HypoPG[0] to create a hypothetical version of every possible index with two or less columns. It runs EXPLAIN against every query and keeps track of which indexes are used. This determines all the indexes that are part of a plan that Postgres _thinks_ will be the cheapest.

The problem with this is Postgres is often wrong about which plan is best. If there is any disparity between the stats Postgres has, or if there is an issue with the Postgres cost model, the recommendations HypoPG will make may not match reality.

I think any good optimization advice has to be based on how the queries actually perform.

#2 Dexter will only check a small set of potential optimizations.

Dexter only looks at creating one and two column b-tree indexes. This is a really small set of all possible optimizations. Dexter will not look for things like:

  - Config Changes
  - Gin Indexes
  - Partial Indexes
  - Table Partitioning
  - Query Rewriting (Dexter wouldn't have caught the issue mentioned in the OP)

I think Dexter may be an ok starting point if you have never touched an index before. I don't think Dexter would have caught any of the major optimizations I've implemented.

#3 Dexter is a fully automated solution.

I don't think fully automated systems will be better than humans anytime within the next few decades. I think you can get some basic recommendations from an automated solution, but any good advice is going to have to come from a human.

[0] https://github.com/HypoPG/hypopg

gopalv · on May 19, 2019

Having hit this sort of problem too many times, I now recommend a DNS style FQDN in all my click-stream data.

The domain stored as ".com.google.mail" etc.

This breaks down a lot of these sort of queries with a simple enough '.com.google%' filter and which gets evaluated internally even on min-max zones.

Gets turned into ">= '.com.google.' and <= '.com.google.\u0ff'" in byte comparisons.

Even then it doesn't really help with the array syntax - because the condition doesn't get pushed there.

blattimwind · on May 19, 2019

Arguably domain names written the way they usually are has been wrong since the beginning, for obvious reasons.

ccleve · on May 19, 2019

It isn't clear to me why so few of these optimization articles talk about temporary tables. They are my go-to solution for many situations.

Temporary tables are great because they break the problem down into smaller pieces. They make the query much more understandable. They are way easier to debug. They are way less sensitive to failing because the query optimizer decides to do things differently because table statistics have changed. They will often eliminate the need for exotic syntax.

You also have the opportunity to add secondary indexes to these tables.

When you create a temporary table, you are essentially doing the work of the optimizer yourself. That's perfectly ok if you know things about the data that the optimizer doesn't or hasn't successfully figured out on its own.

They're not for every situation; they only work well when the intermediate tables are small and fit in memory. But that is very frequently the case.

ComodoHacker · on May 20, 2019

>They are way less sensitive to failing because the query optimizer decides to do things differently because table statistics have changed.

Optimizer doesn't have statistics on temp tables. It has to either gather it during query execution (which costs performance) or use some heuristical estimations (which AFAIK PostgreSQL does). These estimations could be wrong, and query plan becomes unstable.

That's why temp tables aren't used often. Another reason is complexity on app side (several query executions instead of one, transaction management, etc.)

whitehouse3 · on May 19, 2019

One reason you may not see CTE's or temp tables included in these articles is because their specifics are tightly coupled to the schema of your DB. It's hard to write a generic example in Stack Overflow that would work for someone else.

But like you said: if you know the DB schema then temp tables make SQL easier to read, write, and reason about.

ccleve · on May 20, 2019

How are temp tables or CTEs more tightly tied to a schema than the original query? They both reference the same set of tables and columns, and both depend on indexes being in place. A schema change will break both, in equal measure.

oarabbus_ · on May 20, 2019

I'm not sure what you mean by this - it is possible to write any subquery as a CTE

Someone · on May 20, 2019

Might refer to MySQL. It didn’t support any CTEs until version 8.0, from April, 2018.

taffer · on May 19, 2019

I'm not sure if I like the idea of using CTEs as an optimization fence. It means using implementation quirks that can be changed in future releases.

jteppinette · on May 19, 2019

Optimizing a large database (billions of rows) has been the bane of my existence over the last couple months. But, when you finally get that 10x / 100x performance improvement, it’s all worth it... until the next thing.

kjeetgill · on May 19, 2019

I've recently started working on a service with a MySQL database of a few billion rows. I'm used to developing on lower KVs like RocksDB or something.

Any broad, general advice? Cool tricks you've learned? Bulk deletes and a relatively high continuous write rate are killing me right now.

I kinda hate how manual batching deletes and then having to verify them takes.

ploxiln · on May 20, 2019

Deleting lots of rows is perhaps the biggest weakness you are running into. What I do for some cases is "LIMIT 4000", sleep for 1 second, and repeat until no rows affected (tweak the numbers for your situation obviously) (usually this is for a periodic expiry/cleanup task).

_vertigo · on May 20, 2019

It’s probably worth mentioning that at least in MySQL, this is dangerous if using statement based replication.

https://dev.mysql.com/doc/refman/8.0/en/replication-sbr-rbr.... https://dev.mysql.com/doc/refman/8.0/en/replication-features...

ploxiln · on May 20, 2019

Great point.

I happen to use row-format mixed (default on RDS), so that can make it OK. Also, often I have some other related stuff to cleanup, so the pattern is more like: SELECT id, stuff FROM table WHERE expired LIMIT 1000; do cleanup; if all successful: DELETE FROM table WHERE id IN (id, id, ...); so the delete is deterministic for statement-based replication.

ivanhoe · on May 20, 2019

- avoid subqueries if possible (not sure if this still holds true with the latest versions of mysql),

- avoid using offset and limit for pagination, it's a killer on big datasets,

- on huge datasets sometimes breaking a complex query into 2-3 smaller ones can speed up things a lot, so always use explain and common sense when writing queries,

- even with billion rows if query is really slow, it's probably missing a proper index in like 9 of 10 times

akie · on May 20, 2019

I always put an index on nearly every field that could potentially be used to do a SELECT or an ORDER BY on. I'm not sure I've ever run into a serious problem doing this. Usually, the opposite is true - not having an index can cause serious problems.

In a way, I look at indexes like I used to look at these "turbo" buttons (1) on older PCs: no reason not to press it.

(1) https://en.wikipedia.org/wiki/Turbo_button

zambal · on May 20, 2019

Indexes mostly have an impact on write performance and space used on disk, so if both are not a concern for your use case, creating lots of indexes is fine.

ivanhoe · on May 21, 2019

Oh, yes, turbo button, the only thing more useless than caps lock :)

Usually compound indexes are the best idea, but you need first to see what queries are running against DB to know what to add.

etaioinshrdlu · on May 19, 2019

Fewer, but larger queries/statements. Think about the query to add any indexes that may help.

Less concurrency may help. I bet you have deadlock problems with high continuous writes and deletes... maybe you can queue or otherwise delay your writes and then insert with a single thread...

Use a lower transaction isolation level if possible.

yardstick · on May 20, 2019

Is the data deleted because it is too old? Partition your tables by time, and truncate/drop the old tables instead of delete.

kjeetgill · on May 21, 2019

We'd considered it, but partitioning hurts other things like turning a single row query into # of partitions look up.

slifin · on May 19, 2019

Query optimisation is such an underated skill, imo it should be prioritised as a core skill for any backend developer

jjeaff · on May 20, 2019

It seems there are a lot of new "lead" developers out their that don't even understand rdbms, much less how to optimize them. A lot these days seem to be coming into the field thinking that SQL is old hat and nosql is the future.

You should see this newly rebuilt e-commerce system I saw recently all built on mongo. The hoops jumped through and special functions created and work-arounds were just astounding. They basically had to build their own rdbms layer in node to interface with their mongo datastore.

All this because the new hotshot developer convinced management that this was the only way to solve the DB bottlenecks they were having.

...because "MySQL will just never be able to query the product table in less than a few seconds since it has 'millions' of rows".

An entire system rewrite to solve a problem that probably could have been solved with a few indexes. And now they are going to have to implement some things in MySQL again, because all the node.js code is getting overly complicated.

codegladiator · on May 20, 2019

> "lead" developers out their that don't even understand rdbms

Just lead ? I am interviewing "Full stack rockstars" who haven't who think SQL is unfit for every purpose. Frustrating, but a good filter.

davidgould · on May 19, 2019

This article could be improved by including the Postgres EXPLAIN output so the reader could follow along the deductive process.

web007 · on May 20, 2019

Im confused - did they not EXPLAIN their query? That's the first step in ANY slow query debugging!

I kept waiting to see EXPLAIN output so I could say "oh, there's your table scan" and nothing ever showed up.

ddebernardy · on May 20, 2019

> Then, for a small subset of data, I manually verified that all results were correct.

This could have been automated with a full join. with A as queryA, B as queryB select A full join B on all applicable fields where A is null or B is null. Alternatively, you could also do: A except B union all B except A. Either way you get the rows in A that aren't in B and rows in B that aren't in A -- no manual check needed, and you could have run that on the entire set instead of or in addition to counting rows.

fabian2k · on May 20, 2019

I'm a bit confused why they didn't use EXPLAIN ANALYZE at all in this case. Debugging performance issues in Postgres without it makes it unnecessarily hard. I'm not entirely sure without being able to try it myself, but I suspect it should have immediately identified in sufficient detail where most of the time in that query was spent .

perlgeek · on May 20, 2019

The frustrating thing about such optimizations is how badly they fit with the rest of the software development and testing workflow.

Ideally, you want to write a test that prevents a performance regression, but then you need to set up a test database with the right data characteristics (but not real data, for privacy concerns), need the hardware to handle that, need to maintain that database through future schema changes and so on.

It would also be nice to have some kind of mechanism that can tell you that the optimization is no longer necessary when you switch to a new DB version with an improved optimizer.

Most organizations don't have the engineering capabilities or capacities to provide this kind of tooling and infrastructure, so the benefits of CI/CD often don't apply to such changes.

michaelmcmillan · on May 20, 2019

    Then, for a small subset of data, I manually verified that all results were correct.

Where’s your test suite man? :)

AaronFriel · on May 19, 2019

I wonder if a CTE with both sets and `intersect` between them would be more efficient than &&.

jteppinette · on May 19, 2019

> https://www.postgresql.org/docs/11/queries-with.html The optional RECURSIVE modifier changes WITH from a mere syntactic convenience into a feature that accomplishes things not otherwise possible in standard SQL.

Am I incorrect in thinking that "standard" CTEs don't have an impact on performance. When I saw this in the article and how it had no performance improvements it reaffirmed my assumption.

wolf550e · on May 19, 2019

PostgreSQL materializes all CTEs so it has a big impact on performance. It's a hack that works around lack of optimizer hints in PostgreSQL to force the execution plan you want. Only in PostgreSQL 12 did they change this: https://www.postgresql.org/docs/devel/release-12.html#id-1.1...

jteppinette · on May 19, 2019

> https://dba.stackexchange.com/a/13117 Also, a CTE should never be used for performance. You will almost never speed things up by using a CTE, because, again, it's just a disposable view.

wolf550e · on May 19, 2019

Information about SQL in general should not be confused with information about specific RDBMS.

That SE answer is about MS SQL Server.

What I wrote was specific to PostgreSQL which does a specific weird thing using CTEs where they are materialized.

In SQL Server, CTEs are not materialized and are just used to organize the query, to make it more readable.

If I wrote about PostgreSQL heap organized tables, would you object and say I'm wrong because MS SQL Server tables are index-organized, clustered on the primary key?

hobs · on May 19, 2019

Good answer, but fwiw SQL Server has heaps too :p

oarabbus_ · on May 20, 2019

They have an almost-negligible performance decrease in my experience.

I worked with a client who was transitioning from their MySQL DB to a (mysql-flavored) AWS Aurora DB, as they required high concurrency. In order to handle the throughput their requirement was for the queries to run in under 1 second. After getting down to ~1.1-1.3s on average, I had to figure out how to squeeze the last bits of optimization I could get. One thing I did was rewrite all the CTEs to subqueries, and if I recall correctly, it shaved something like a few percent off the query time (~.01-.05s).

I'd call it negligible as the speed increase certainly isn't worth the decrease in readability and maintainability of the query.

AaronFriel · on May 20, 2019

Yes, but I think it's understandable given that nothing in the spec suggests CTEs should affect performance.

Practically speaking, though, CTEs can be used to organize code, and query plan generators/optimizers may take that into account. Or, as someone else added, PostgreSQL will materialize the CTE as if it were a temporary table being used as an input.

Almost all RDBMS advice is implementation specific, sadly.

manish_gill · on May 19, 2019

This is something I didn't consider. Will definitely try it out. Thanks!

_pgmf · on May 20, 2019

Unbelievable that this is even getting upvotes. Anyone with a basic understanding of relational databases could see that nested o(n) ilike filter was the problem. I kept reading in the hopes that maybe something actually interesting was going on, besides ineptitude. Disappointing.

When you have a huge list like that in a filter, the answer is to treat it like a table and perform a join. Then the query planner can optimize the ordering of the loops. But this is just mind-boggling ignorance dressed up as a "debugging horror story".

jdreaver · on May 20, 2019

Get over yourself.

This is a well-written article in which the author lays out their thought process for optimizing a slow SQL query. The article certainly provides more value to someone who may not be an RDBMS expert than your comment does.

FrancoisBosun · on May 20, 2019

You know about such trick, others may not. I loved this article for the thought process, not for the nested loop error.

thomaswang · on May 19, 2019

I think the root of your issue is the %text% leftsided % is horribly slow.

zimpenfish · on May 19, 2019

Addressed in the article:

> The pattern matching query itself is taking only 5 seconds independently. Matching millions of unique URLs is clearly not a problem.

thomaswang · on May 19, 2019

Assuming it only loops once in the query yes, that's 5 seconds. But (and I'm guessing here) having it in the where directly leads to something like a n+1 issue for full tablescan/non indexed queries as a right or non anchored i/like quickly leads to without the propper index.

What I'm meaning to say is that, the issue is not solved, it is simply worked around by narrowing down the set queried, not really postgres magic but usage of own data knowhow.

I believe a Trigram index whould solve the issue at its root.

manish_gill · on May 20, 2019

Thanks, I'll take a look at this approach.

Someone · on May 19, 2019

As thomaswang says, that assumes that the query planner recognizes that it can hoist that query part out of the loop (https://en.wikipedia.org/wiki/Loop-invariant_code_motion)

If it doesn’t, and runs it again and again, time will add up (possibly by less than 5 seconds for each time it gets run because the data could stay in memory)

That theory was only disproved at “Move the sub-query into a CTE” section.

Also, I’m not familiar with Postgres, so I don’t know whether it could significantly affect timing, but the query benchmarked at 5231.765 ms isn’t identical to that in the larger query. The latter has ::text[] added.

I also would try and replace

    AND r_time > to_timestamp(1547585600)
    AND r_time < to_timestamp(1549177599)

by

   AND r_time BETWEEN to_timestamp(1547585600+1) AND timestamp(1549177599-1)

because the query optimizer might not detect that in this query.

manish_gill · on May 20, 2019

BETWEEN is soemthing that is on the "Don't do this" page of Postgres. :)

https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use...

the ::text[] addition had no impact on perf. I actually constructed this post from my notes at the time so might missed have a few things here and there.

The bit about query planner recognizing things is exactly what the challenge is to be honest. Like I said at the end, the good mental model of SQL helps a lot, and mine is decent-ish I hope but can still learn a lot. :)

Someone · on May 20, 2019

That’s why I added the +1 and -1 fragments :-) I thought those computed the smallest larger/largest smaller time stamp.

I didn’t realize that time stamps weren’t integers in PostgreSQL, though, so that is a bug.

Are you sure your query, which uses an interval that is open at both ends, is correct?

zimpenfish · on May 20, 2019

> As thomaswang says, that assumes that [...]

In a subsequent comment, yes; but in the original comment I responded to, there was nothing about the surrounding context, only (basically) "left % is slow".

thomaswang · on May 20, 2019

Sorry to say, but I stand my ground. I believe it is the left % that is the root issue. As said without further explanation in my initial comment. (it's the same issue on almost every database, and I have never seen it not, being the main culprit, and the solution that never seme to fail is always index)

vbezhenar · on May 20, 2019

With Oracle 9i query “name like ?” was never optimized to use index. Even when parameter was “pref%”, query took minutes. But with query “name like ‘pref%’” it used index and was lightning fast. So I had to rewrite my code from using prepared statements to manually escape and inline values. Felt weird!

jjluoma · on May 19, 2019

I find this a little bit funny,

> I was extremely suspicious of the EXISTS optimization, since it changes the logic to exit early.

Does this this mean that if one has found an answer, it is somehow sensible thing not to return the answer right away?

beachy · on May 19, 2019

With a database, that depends if you want "all the answers" or "any answer". Exists is the latter.

sovnade · on May 19, 2019

Thanks - good investigation. We're replatforming a legacy app from Oracle to Postgres in the near future and I'm sure we'll see stuff just like this. Definitely saving so I can refer back.

superjisan · on May 21, 2019

This is pretty great. I've been meaning to find a good story and helpful guidelines for making queries faster as I taught students about SQL.

ComodoHacker · on May 20, 2019

A funny bit:

>I started googling for a way to do set-intersection in Postgres without using the && and it wasn’t going anywhere.

That's what SQL was made for! It's called 'join'. Perhaps is wasn't the best idea to store session's urls in an array and not in a table. (It's called normalization.)