This paper is from 2015 it appears. Can anyone comment on how relevant this is w...

__s · on July 13, 2020

Here's a neat extension that tries to use genetic algorithms to learn better planning for one's queries, includes slides which cite this & have TPC numbers

https://www.pgcon.org/2017/schedule/events/1086.en.html

jzelinskie · on July 13, 2020

I can't comment on your question, but thank you for finding the year this paper was written. Having read many older papers, it's sometimes like solving a murder mystery figuring out what year a paper was written. The year a paper is written is vital for understanding social context of the research being presented in addition to any context cited in the paper.

nighthawk454 · on July 13, 2020

Agreed, especially in fast-changing fields or after recent breakthroughs. One trick I use is look at the References and find the approx. max year cited. Generally the same or pretty close to the year of the paper.

ergl · on July 14, 2020

With CS conference papers, it's quite easy to see the date on the bottom-left of the first page. You see the copyright year, conference name, etc

Ar-Curunir · on July 14, 2020

Isn’t it usually a matter of just googling the title?

jasonwatkinspdx · on July 14, 2020

This has long been a pet peeve of mine.

petergeoghegan · on July 14, 2020

Most of that stuff depends on explicit CREATE STATISTICS commands being run in order to work around column correlations and stuff like that. The general assumption of independence among columns/attributes is pretty universal (as the paper actually says).

One of the most useful areas for future improvement is making plans more robust against misestimations during execution, for example by using techniques like role-reversal during hash joins, or Hellerstein's "Eddies".

abernard1 · on July 15, 2020

> The general assumption of independence among columns/attributes is pretty universal (as the paper actually says).

So, the paper definitely talks about how independent column statistics are a problem with big tables in the default stats configuration.

...But the option of creating correlated, non-independent column statistics did not exist in PG until after this paper. Which was my point.

In my experience, flat out increasing statistics sample rates fixes 80%+ of the problems in this paper, with basically no downsides. (You can push that computation to downtime when no-one cares.)