Here's a neat extension that tries to use genetic algorithms to learn better planning for one's queries, includes slides which cite this & have TPC numbers
I can't comment on your question, but thank you for finding the year this paper was written. Having read many older papers, it's sometimes like solving a murder mystery figuring out what year a paper was written. The year a paper is written is vital for understanding social context of the research being presented in addition to any context cited in the paper.
Agreed, especially in fast-changing fields or after recent breakthroughs. One trick I use is look at the References and find the approx. max year cited. Generally the same or pretty close to the year of the paper.
Most of that stuff depends on explicit CREATE STATISTICS commands being run in order to work around column correlations and stuff like that. The general assumption of independence among columns/attributes is pretty universal (as the paper actually says).
One of the most useful areas for future improvement is making plans more robust against misestimations during execution, for example by using techniques like role-reversal during hash joins, or Hellerstein's "Eddies".
> The general assumption of independence among columns/attributes is pretty universal (as the paper actually says).
So, the paper definitely talks about how independent column statistics are a problem with big tables in the default stats configuration.
...But the option of creating correlated, non-independent column statistics did not exist in PG until after this paper. Which was my point.
In my experience, flat out increasing statistics sample rates fixes 80%+ of the problems in this paper, with basically no downsides. (You can push that computation to downtime when no-one cares.)
Can anyone comment on how relevant this is with the enhanced statistics types in Postgres 10, 11, 12?