100% agree this is a perfect use case for Cadence or Temporal.

nerdponx · on Jan 31, 2024

Coming from the "data world", how does a tool like Airflow/Dagster/Prefect differ from these?

jtmarmon · on Feb 1, 2024

I recently evaluated Dagster, Prefect, and Flyte for a data pipeliney workflow and ended up going with Temporal.

The shared feature between Temporal and those three is the workflow orchestration piece. All 3 can manage a dependency graph of jobs, handle retries, start from checkpoints, etc.

At a high level the big reason they’re different is Temporal is entirely focused on the orchestration piece, and the others are much more focused on the data piece, which comes out in a lot of the different features. Temporal has SDKs in most languages, and has a queuing system that allows you to run different workflows or even activities (tasks within a workflow) in different workers, manage concurrency, etc. You can write a parent workflow that orchestrates sub-workflows that could live in 5 other services. It’s just really composable and fits much more nicely into the critical path of your app.

Prefect is probably the closest of your list to temporal, in that it’s less opinionated than others about the workflows being “data oriented”, but it’s still only in python, and it deosn't have queueing. In short this means that your workflows are kinda supposed to run in one box running python somewhere. Temporal will let you define a 10 part workflow where two parts run on a python service running with a GPU, and the remaining parts are running in the same node.js process as your main server.

Dagster’s feature set is even more focused on data-workflows, as your workflows are meant to produce data “assets” which can be materialized/cached, etc.

They’re pretty much all designed for a data engineering team to manage many individual pipelines that are external from your application code, whereas temporal is designed to be a system that manages workflow complexity for code that (more often) runs in your application.

lorendsr · on Feb 2, 2024

I wrote up this comparison: https://community.temporal.io/t/what-are-the-pros-and-cons-o...

jaydeegee · on Feb 1, 2024

They definitely are similar and can be used for similar functions but Cadence/Temporal are focused on code orchestration side rather than data orchestration.

nerdponx · on Feb 1, 2024

I find that comparison interesting, because I don't think of Airflow as particularly data-oriented in terms of its features and core functionality. I tend to think of Airflow as "Cron + Make", with any "data-oriented" features being nice to have, but not essential.

I'm substantially less familiar with Dagster and Prefect so can't comment as much on those.

Maybe the most data-oriented thing about Airflow is its concept of a data interval, where each DAG run is associated with some "logical date" and an interval of time that starts from the logical date (inclusive) and ends at the next logical date in the schedule (exclusive). The idea is that if you have a daily task that runs at 1 AM, then the task is expected to operate on data starting from "yesterday at 1 AM" until "today at 1 AM". But it's entirely up to the user/developer what you actually do with those logical date ranges, and you're free to ignore them entirely if you don't need them.

wharvle · on Feb 2, 2024

I’ve only recently encountered airflow for the first time, and have been surprised at how half-baked it is for being damn near the industry standard (as far as open source, anyway). And it was a lot worse until recently!

Dynamic task dispatch being a relatively recent feature. The fundamental design imposing lots of structure (well, kind of—you can skip lots of it, but it takes time to figure that out) to practically no benefit (and god, is the terminology dumb, made all the more so because half the stuff it names is nearly useless). “Oh yeah the scheduler just crashes or locks up while still health-checking all the time, standard practice to so restart it frequently” posted on a hundred different issues dating from yesterday to years ago (many fixed! And yet…). It’s pretty bad at passing data between tasks (see again: lots of structure, little benefit)

mianos · on Feb 1, 2024

It doesn't. They didn't look for it. This is exactly what DAG based workflow systems to, in a modern professional manner. Crons don't do a DAG.