I was a fan of dbt for a while, but the shine wore off when I saw one of my smar...

tomnipotent · on Feb 12, 2024

> allows me to run dplyr operations on database tables

Which require that data take a round trip between the R process and the database. It's not uncommon for these jobs to spend more time in the read/write step than doing meaningful work, and why I prefer dbt when possible to keep transformations as close to the data as possible.

nograpes · on Feb 12, 2024

The second point the grandparent made was that they were using dbplyr, which allows you to avoid having the data take a round trip between the R process and the database.

leledavid · on Feb 14, 2024

The dbplyr package takes your R code and executes it in your database, returning a remote dataframe. This is quite useful because there is no mvoving data back and forth.

golergka · on Feb 12, 2024

We built the perfect IDE for dbt at Deep Chanel, with autocomplete automatic real time type checking for the whole project.

Sadly, the company closed this June, and even the website is already down.

neighbour · on Feb 12, 2024

You need to find a way to release this product. I would pay to use it.

golergka · on Feb 12, 2024

It's already been released and publicly available for free for almost a year.

CRConrad · on Feb 12, 2024

So where is it, if "even the website is already down"?

ETA: Seems it isn't. https://www.deepchannel.com/ That what you meant?

golergka · on Feb 12, 2024

Hmm, may be I've had network issues last time I checked. Anyway, that's the only place you can get it — it's a completely free desktop app, but it's not open source.

civilized · on Feb 12, 2024

This is extremely interesting. You might be onto something.

jochem9 · on Feb 12, 2024

Imo dbt sits in the analysts space. SQL is the language they know and dbt is the best solution to scale that.

In that sense it should only be used for last mile data transformations. Not for wrangling raw data into neat tables.

fifilura · on Feb 12, 2024

I am curious, what do you perceive as the difference between data transformations and "wrangling raw data into tables"?

Edit: I am also curious why you consider SQL to be inferior for the latter?

jochem9 · on Feb 14, 2024

The difference is that raw data can come in many shapes (e.g. impossibly nested jsons) and unexpected quality (changing field names...). I cannot easily work with this in SQL and definitely not write tests to cover the ever increasing complexity of dealing with messy data.

After that step the data is a lot more uniform, so then it's easy to use SQL.

itsoktocry · on Feb 14, 2024

Dbt isn't a substitute for pandas/R to a data scientist. They are complements. Dbt is for the data transformation pipeline that prepares the data so the DS can write simple queries using their favourite tools on curated data.

camgunz · on Feb 12, 2024

The core divide in this space is SQL vs. $OTHER_LANGUAGE. If you're fluent in SQL you'll be great at dbt. If you're only fluent in not-SQL you won't.

A lot of these discussions are basically "can we please just use the language/tooling we know", and sometimes the answer's yes! A lot of people breathe a big sigh of relief when they discover you can use like, Spark through Python and what-not. Regardless of whether or not it's a good idea or good use of resources, this space is advancing because the market demands it. But like, I don't think you're ever gonna use dbt with not-SQL; it's all in on SQL. Feels like maybe it was the wrong fit for your team.

I should say I'm not sure if your coworker was an SQL person so dunno if this directly applies. Just saying what my experience has been.