Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(Ozgun from Citus Data)

Your observations on cstore_fdw are accurate. I'm adding a few points about our learnings from cstore deployments in production.

From a use-case standpoint, we find that most cstore deployments use it for data archival and to a lesser extent data warehousing workloads on Postgres. Some of these deployments are fairly sizable (Fortune 1000) and they primarily benefit from compression that comes with cstore_fdw (3-10x).

Citus + cstore_fdw deployments exist, but they are fewer in number. This isn't because of an architectural issue related to cstore or Postgres FDW/extension APIs. Rather, making cstore work for real-time workloads require notable implementation work. In particular:

(1) Real-time ingest & update / delete capabilities: Columnar stores have a harder time providing these features. The most practical way to achieve this is by partitioning your data on time, using a row store for the most recent data, and then rotating that data into a columnar store.

Users who use cstore for real-time workloads, manually set up table rotation themselves. In fact, this type of integration between Citus (distributed db) and cstore_fdw (columnar storage) has been one of the most requested features: https://github.com/citusdata/citus/issues/183

(2) High-availability & disaster recovery: This requires that cstore integrates with PG's write-ahead logs. Fortunately, Postgres 9.6 makes this possible / pluggable and it's on cstore_fdw's immediate roadmap.

(3) Indexes for quick look-ups: cstore_fdw comes with built-in min/max indexes. This helps but doesn't provide quick look-ups across any field. We thought about integration with PG indexes -- one challenge there involves PG indexes' storage footprint and potential hit on compression ratios.

(4) Query planning and execution tuned for columnar store: If your workload is bottlenecked on CPU (rather than disk), these type of improvements can help a lot. For now, we're happy to see both recent and upcoming analytical query performance improvements that are coming up in Postgres 10 and beyond.

Also, putting cstore_fdw aside, I'm a huge fan of Timescale and the work they are doing! My favorite "feature" is that Timescale is a PostgreSQL extension. :)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: