Some ETLs these days *are* simple scripts - e.g. Spark, Dataflow - with their co...

cosmie · on May 9, 2018

Being able to define an ETL workload within a simple script is not the same as your ETL system itself being a simple script.

While I love both Spark and Dataflow, both of them are incredibly complex distributed systems with very high operational costs. Someone, somewhere is paying a lot of money to have an operational resource maintain that complexity. Whether you have an internal devops resource doing so or you're using a managed service, you're paying for that complexity somehow. And, for a lot of workloads, you aren't actually getting any more value than you would from standing up a ~$50/month standard Debian/Ubuntu server and a set of simple scripts on it.

Mironor · on May 10, 2018

You don't have to have a cluster to run spark scripts, setting master to `local` (and running it on one machine) is often enough for small anounts of data.

bduerst · on May 10, 2018

They don't have high operational costs - you can run them as a script on your local machine. You're making them out to be more complex than they really are.