Your observations on cstore_fdw are accurate. I'm adding a few points about our learnings from cstore deployments in production.
From a use-case standpoint, we find that most cstore deployments use it for data archival and to a lesser extent data warehousing workloads on Postgres. Some of these deployments are fairly sizable (Fortune 1000) and they primarily benefit from compression that comes with cstore_fdw (3-10x).
Citus + cstore_fdw deployments exist, but they are fewer in number. This isn't because of an architectural issue related to cstore or Postgres FDW/extension APIs. Rather, making cstore work for real-time workloads require notable implementation work. In particular:
(1) Real-time ingest & update / delete capabilities: Columnar stores have a harder time providing these features. The most practical way to achieve this is by partitioning your data on time, using a row store for the most recent data, and then rotating that data into a columnar store.
Users who use cstore for real-time workloads, manually set up table rotation themselves. In fact, this type of integration between Citus (distributed db) and cstore_fdw (columnar storage) has been one of the most requested features: https://github.com/citusdata/citus/issues/183
(2) High-availability & disaster recovery: This requires that cstore integrates with PG's write-ahead logs. Fortunately, Postgres 9.6 makes this possible / pluggable and it's on cstore_fdw's immediate roadmap.
(3) Indexes for quick look-ups: cstore_fdw comes with built-in min/max indexes. This helps but doesn't provide quick look-ups across any field. We thought about integration with PG indexes -- one challenge there involves PG indexes' storage footprint and potential hit on compression ratios.
(4) Query planning and execution tuned for columnar store: If your workload is bottlenecked on CPU (rather than disk), these type of improvements can help a lot. For now, we're happy to see both recent and upcoming analytical query performance improvements that are coming up in Postgres 10 and beyond.
Also, putting cstore_fdw aside, I'm a huge fan of Timescale and the work they are doing! My favorite "feature" is that Timescale is a PostgreSQL extension. :)
Your observations on cstore_fdw are accurate. I'm adding a few points about our learnings from cstore deployments in production.
From a use-case standpoint, we find that most cstore deployments use it for data archival and to a lesser extent data warehousing workloads on Postgres. Some of these deployments are fairly sizable (Fortune 1000) and they primarily benefit from compression that comes with cstore_fdw (3-10x).
Citus + cstore_fdw deployments exist, but they are fewer in number. This isn't because of an architectural issue related to cstore or Postgres FDW/extension APIs. Rather, making cstore work for real-time workloads require notable implementation work. In particular:
(1) Real-time ingest & update / delete capabilities: Columnar stores have a harder time providing these features. The most practical way to achieve this is by partitioning your data on time, using a row store for the most recent data, and then rotating that data into a columnar store.
Users who use cstore for real-time workloads, manually set up table rotation themselves. In fact, this type of integration between Citus (distributed db) and cstore_fdw (columnar storage) has been one of the most requested features: https://github.com/citusdata/citus/issues/183
(2) High-availability & disaster recovery: This requires that cstore integrates with PG's write-ahead logs. Fortunately, Postgres 9.6 makes this possible / pluggable and it's on cstore_fdw's immediate roadmap.
(3) Indexes for quick look-ups: cstore_fdw comes with built-in min/max indexes. This helps but doesn't provide quick look-ups across any field. We thought about integration with PG indexes -- one challenge there involves PG indexes' storage footprint and potential hit on compression ratios.
(4) Query planning and execution tuned for columnar store: If your workload is bottlenecked on CPU (rather than disk), these type of improvements can help a lot. For now, we're happy to see both recent and upcoming analytical query performance improvements that are coming up in Postgres 10 and beyond.
Also, putting cstore_fdw aside, I'm a huge fan of Timescale and the work they are doing! My favorite "feature" is that Timescale is a PostgreSQL extension. :)