Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was a fan of dbt for a while, but the shine wore off when I saw one of my smartest coworkers try to use it. His development speed was about an order of magnitude slower than what I expect from data scientists using dataframe packages like R's dplyr, Python's polars, or Spark DataFrames. In my own experience, dbt is significantly better than writing raw SQL, but still nowhere near a normal software development experience. It needs an IDE badly, but the current IDE is cloud-only.

I am currently building my analytics pipelines on top of R's dbplyr package, which allows me to run dplyr operations on database tables just as if they were local data frames. Since it's R, it's not without warts, but at least I get to use a real programming language and the free and fantastic RStudio IDE.



> allows me to run dplyr operations on database tables

Which require that data take a round trip between the R process and the database. It's not uncommon for these jobs to spend more time in the read/write step than doing meaningful work, and why I prefer dbt when possible to keep transformations as close to the data as possible.


The second point the grandparent made was that they were using dbplyr, which allows you to avoid having the data take a round trip between the R process and the database.


The dbplyr package takes your R code and executes it in your database, returning a remote dataframe. This is quite useful because there is no mvoving data back and forth.


We built the perfect IDE for dbt at Deep Chanel, with autocomplete automatic real time type checking for the whole project.

Sadly, the company closed this June, and even the website is already down.


You need to find a way to release this product. I would pay to use it.


It's already been released and publicly available for free for almost a year.


So where is it, if "even the website is already down"?

ETA: Seems it isn't. https://www.deepchannel.com/ That what you meant?


Hmm, may be I've had network issues last time I checked. Anyway, that's the only place you can get it — it's a completely free desktop app, but it's not open source.


This is extremely interesting. You might be onto something.


Imo dbt sits in the analysts space. SQL is the language they know and dbt is the best solution to scale that.

In that sense it should only be used for last mile data transformations. Not for wrangling raw data into neat tables.


I am curious, what do you perceive as the difference between data transformations and "wrangling raw data into tables"?

Edit: I am also curious why you consider SQL to be inferior for the latter?


The difference is that raw data can come in many shapes (e.g. impossibly nested jsons) and unexpected quality (changing field names...). I cannot easily work with this in SQL and definitely not write tests to cover the ever increasing complexity of dealing with messy data.

After that step the data is a lot more uniform, so then it's easy to use SQL.


Dbt isn't a substitute for pandas/R to a data scientist. They are complements. Dbt is for the data transformation pipeline that prepares the data so the DS can write simple queries using their favourite tools on curated data.


The core divide in this space is SQL vs. $OTHER_LANGUAGE. If you're fluent in SQL you'll be great at dbt. If you're only fluent in not-SQL you won't.

A lot of these discussions are basically "can we please just use the language/tooling we know", and sometimes the answer's yes! A lot of people breathe a big sigh of relief when they discover you can use like, Spark through Python and what-not. Regardless of whether or not it's a good idea or good use of resources, this space is advancing because the market demands it. But like, I don't think you're ever gonna use dbt with not-SQL; it's all in on SQL. Feels like maybe it was the wrong fit for your team.

I should say I'm not sure if your coworker was an SQL person so dunno if this directly applies. Just saying what my experience has been.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: