Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Y’all don’t have CI/CD? Maybe it’s hyped but we do simple stuff. Snowflake schema DWH. Jinja2 templated SQL or dbt. Airflow with a monolith tool that does transforms and such. Is it perfect? No. Is it understandable? Very much so.

Testing is such an interesting concept in data engineering. One needs consistent test data. We aim to implement that with snapshots eventually but now we have sanity checks at each layer.

> lacking industry guidance and generic tooling to support: CI/CD, versioned deployment artifacts, zero downtime deployments, unit testing, observability, monitoring and alerting.

Ci cd is doable and easy: changes are deployed via GitHub actions, the CI part is a bit missing I guess without tests. Versioned artifacts also, add a tag to the airflow job to know what tagged version of the code is running. Zero downtime deployments we do that all day — I mean using views we can a/b deploy changes to the underlying tables and then do a simple schema change to the view and nobody knows the difference. Unit testing still yet to be done. Observability and alerting we use internal dashboards and sentry.

What am I missing?



FWIW, we are building a CI/CD solution snowflake https://www.bytebase.com/docs/tutorials/database-change-mana...


Testing and monitoring in every aspect are the only things that are greatly lacking, you always need to glue something together to get nice tests or some kind of data contract monitoring.


Relevance and actual value, I'd assume.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: