Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Launch HN: Hightouch (YC S19) – Sync data from data warehouses to SaaS tools
132 points by kashishg on Nov 11, 2021 | hide | past | favorite | 47 comments
Hey HN! Kashish, Tejas, and Josh here. We’re building Hightouch (https://www.hightouch.io/), a reverse ETL platform— that is, software that gets your data back out of your data warehouse and into the SaaS tools that people at your company are familiar with (like Salesforce). We enable you to Bring Your Own (BYO) database so that all your SaaS tools run off of the same dataset. You specify what data you want and where, and we take care of the rest.

We were exposed to the data integration space as early engineers at Segment. Segment and other CDPs (Customer Data Platforms) were built on an older model that hits a wall once you reach a certain level of complexity. You don’t have access to your own data, there isn’t a great way to express business logic, and you don’t have flexibility to transform data to your needs.

Cloud-based data warehouses like Snowflake and tools like dbt solved part of this problem. Where Segment/CDPs require you to store data in their format, warehouses let you store your data in any format. They let you store it in your own cloud for privacy and security. And where Segment/CDPs only have event data, warehouses have all your data—things like a full replica of Salesforce data and a full replica of Postgres data.

The problem is, all this data tends to get stuck in the warehouse and only get used for reports and dashboards. In our experience, business teams don’t want another BI dashboard. They want their data in their primary tools—the SaaS applications where they spend their days—so they can use it to actually operate their business.

Because of this mismatch, a lot of engineers are doing busywork writing scripts to get data from warehouses into CRMs like Salesforce, Hubspot, Customer.io, and so on. Such scripts are brittle—they need changing when users request more columns, when an API changes, etc. And that’s only if the business people are lucky enough to get engineers’ time in the first place. There are also a lot of teams downloading CSVs and manually uploading them to various platforms because they can’t get their work in front of engineers who are busy with a hundred other priorities.

We decided to build something that would appeal to both sides: business people who know what they want from their data and just need access to it, and data engineers who want to help but can’t build and maintain every integration as the marketing team buys an endless number of SaaS tools. That’s how we came up with Hightouch.

Hightouch is a platform that makes it easy to take models and views from your warehouse and sync them into your SaaS apps, using only SQL to express your logic. Mapping between columns and application fields is done through a declarative UI. You write a SQL query to pull the data you need, map columns from that query to fields in your SaaS tool, and set how often you want data to sync. We handle the rest. See a demo here: https://www.youtube.com/watch?v=kDhHWG9hwj0.

No more hard-coding database columns to a Salesforce field in Python or Javascript, only to have a sales team ask for a ‘quick change’. We handle all the annoying complexities of moving data around: type-casting, error handling, authentication, retries, debugging, observability, notifications/alerts, and changing APIs—freeing up your engineering time to work on problems specific to your business rather than syncing data into a CRM.

We’ve also built integrations we think data engineers will love. We integrate directly with dbt and dbt Cloud, we offer git sync for version control of your models and syncs, we have an Airflow operator, as well as a public API, and we’d love your ideas on what you think is missing.

Hightouch doesn’t store anything. We connect directly to your existing warehouse/database and SaaS tools. As data changes in the warehouse, it changes in the SaaS tool. You get full control and your data is always owned by you.

Our customers use Hightouch to do things like: sending a feed of new leads and customers to Slack; syncing product usage data into CRMs like Hubspot and Salesforce; and syncing user cohorts into marketing systems, such as all users who abandoned their shopping cart, or users with a high “churn risk” score.

We’ve grown from 4 people to almost 30 now, and work with amazing customers like CircleCI, Plaid, Retool, Ramp, Lucid Chart, Nando’s, Grafana, Kong, Autotrader, Blend, and Imperfect Foods. We’re also hiring—we have over 15 positions open at https://hightouch.io/careers/, and we would love to meet you and have you join the team.

We’d love to hear your thoughts, feedback and experiences on data warehouses, building integrations, ETL, workflow orchestration, and anything data related!



Congratulations on the launch.

We were looking for this very solution. Writing queries and mapping them downstream is pretty handy.

I also noticed that you support Rudderstack(yet another great tool), and we can send events via their http connector.

Looking forward to using this tool.

Do you plan on adding Clickhouse as source anytime soon?


Yes, Clickhouse is a top priority source for us to build next to enable real-time analytics (we already support Rockset as a source used by customers like Seesaw). Would love to learn more about your use case for Clickhouse: feel free to reach out to hello@hightouch.io!


Interesting tool indeed. You made a good point about RudderStack and synergies there. I'm curious to see how hightouch is going to diffentiate from something like RudderStack, which has a boatload of reverse ETL functionality of its own. I mean event stream data and moving data from your warehouse back out to tools is pretty their mantra


Then you could say the same thing about Hightouch and Segment. Segment == Rudderstack for all intensive purposes.

The way it's differentiated is that Rudderstack does event collection + forwarding + some flavor of reverse ETL. In order to be successful with them you'd need to replace your whole stack and do event collection with them. A lot of companies already have an existing stack and they want to buy a best-in-class Reverse ETL player (something that takes 5 min to set up) and that's where we come in!


RudderStack founder here.

We tend to believe we have a pretty good reverse-ETL product too which can be used standalone without event collection. Best is class is a moving target anyway and upto customers to judge :)

However, I do agree on the point around focus. Our positioning, go-to-market, pricing, use cases etc are centered around building the end to end customer data infrastructure. There are folks who already have pieces of the stack or don't need a full CDI (e.g. non PLG B2B companies) and only need reverse-ETL.

The market is enormous so I believe we both will do great. Congrats on the launch.


Agree RE: focus. Rudderstack is a bundle and there is a place for that.

Hightouch is 100% focused on activating data from the warehouse. Everything we build is Reverse ETL or built on top of Reverse ETL - that means we spend every waking minute thinking about progressing the Reverse ETL space, just like Fivetran are laser-focused on SaaS data ingest or Snowplow on behavioral/event data ingest (other parts of the Rudderstack bundle)

We're a big fan of RudderStack's drive. As an ex-Segmenter, I can say that competing with that team is not easy. Best of luck! We will continue watching from the sidelines :)


Thanks Tejas for your kind words :) We too have nothing short of enormous respect for what you guys have achieved. Looking forward to you and team to push innovation in this space forward.


I've been a hightouch customer for nearly a year now, and I have to say the team and product are both great.

That being said, isn't it a bit late for a Launch HN post? :P


Well...

Launch HN: Rainforest QA (YC S12) – No-Code UI Test Automation - https://news.ycombinator.com/item?id=28947689 - Oct 2021 (88 comments)

Launch HN: RescueTime (YC W08) – Redesigned for wellness, balance, remote work - https://news.ycombinator.com/item?id=28683597 - Sept 2021 (141 comments)

The basic rule is that each YC startup gets one. We've made a couple exceptions in cases of complete reinventions.


Thanks so much for your support from the very beginning! We've only been in market publicly for a little over a year now actually. We figured better late than never (and it still feels early for us!) :)


Really, really love that it's just SQL. How is data mapped to the target API from the SQL projection? Are the columns themselves the actual API contract?


Cofounder here! Not quite. In Hightouch, you define your model (SQL) and create a sync (point-and-click or JSON/YAML).

The syncs are declarative, not imperative. They don't map 1:1 to API calls by design. You tell us what you want the destination to look like, and we figure out how :). Kinda like how a database creates the best plan for your SQL query before executing it.

Here's an example - https://i.imgur.com/05T5iKK.png. This sync maps your users table to Salesforce "Contacts" and the mapping interface also encodes the foreign key relationship between Contact:Account in Salesforce. Under the hood, we do all the lookups, caching, batch API calls using the bulk API, automatically handle rate limits, and only send changes from your database.

This is one of our key design differences compared to iPaaS tools like Tray, Zapier, Workato, Mulesoft, etc., which tend to just map actions to API calls 1:1. Data integration being declarative is something I'm really passionate about personally... wrote a blog with more examples at https://hightouch.io/blog/the-future-of-data-integration-wha...


> You tell us what you want the destination to look like

Implicit mapping between SQL to target is great, but how does the SQL author know what SQL to write in the first place?

I've done no shortage of integrations like this, and there is no avoiding reading the target SaaS documentation to know what their schema looks like so I can shape data accordingly. Without that step, I can't even start writing SQL.


Not implicit but - declarative! Our goal is to provide enough context in our docs, app (e.g. autocomplete, automatic schema discovery, etc.), and resources to guide users through this and then recipes on top for common workflows!

We do a lot of validation upfront (at both the schema & data layer), and I think it's still early days there... this is a big opportunity IMO. Great callout.

We find people start with a simple SQL model + sync and then bounce back and forth and edit their queries as they explore our columns.


What if the state of IsHireable changes in the system of record (SFDC)? Will Hightouch overwrite OLTP data with stale warehouse data?


Short answer is yes, but here's why:

By syncing data to a particular field in Salesforce, you're effectively saying that the source of truth for that field is the warehouse, and not Salesforce. If you expect a human to update a field, then Salesforce is the source of truth for that field, and Hightouch shouldn't write to it!

What we typically see is that Salesforce contains data that's expected to be updated and maintained in the tool, and then other "read-only" fields coming from Hightouch.


One small suggestion when connecting fields, auto-select a best guess column. A good deal of the time it will match (email=>email, first_name=>firstName), and it will cut in half the time to configure that part of the sync. Or an option to toggle this on/off.


This is on our roadmap! Slotted for release sometime in the next month.


" They want their data in their primary tools—the SaaS applications where they spend their days—so they can use it to actually operate their business."

I'm obviously missing something here, but thinking in terms of "operationalizing" data that comes from some kind of analytical environment, was not the data in their operational SaaS tools in the first place?


It starts there, but then once you get into complex workflows that merge data across your product and CRMs it all moves to the warehouse first. Typical flow is a Fivetran or Stitch into the warehouse, lots of dbt models, then business models fit for consumption down stream.

Once in the warehouse, it needs to get back into those operational systems again, which is the tricky part.

I’ve done these one off integrations from the warehouse into Salesforce (creating leads, converting them, moving stages all based off product usage), and into marketing tools (customer segmentation built using SQL in the warehouse, then sent to marketing automation tools).

Being able to feed tools directly off the warehouse instead of writing one off integrations is the real value.


The Salesforce data is in Salesforce, and the HubSpot data is in HubSpot, and the Mixpanel data is in Mixpanel, but those applications don't have each others data (not to mention missing any transformations on top). E.g. As a sales rep, you can benefit from understanding product usage and marketing activity for a contact in salesforce


I am probably missing something here, but how is Hightouch differentiated from other similar tools like Census (https://www.getcensus.com/)?


Great question! We have a whole page about this here (https://hightouch.io/blog/hightouch-vs-census/).

But the TLDR is that Hightouch has more developer focused features (like a live debugger, alerting, version control with Git, and more here: https://hightouch.io/data-features/), a dedicated UI for business users to visually filter models (called Hightouch Audiences), more transparent pricing, as well as more integrations (70+) that are also deeper and customized for each tool.


Very cool and thanks for the response! I will definitely give Hightouch a try.


Interested to try this out one day!

My Data Engineering team has spent weeks building API integrations this year, so this would be valuable.


Interesting. How does Reverse ETL differ from ETL/ELT tools like Fivetran?


On a high level, ETL/ELT is about sending data from your SaaS tools into your data warehouse (you are reading from different tools). Reverse ETL is about getting data from your warehouse into tools (writing into different tools). Building ELT is a fundamentally different technical challenge than building Reverse ETL. Aspects like types, rate limits, and destination state (knowing whether data already exists in a destination) are unique to Reverse ETL. Visibility becomes challenging too as some destinations have unique quirks, like API contracts where you write to them but you don’t know if the write was successful or completed until later. Writing to tools also requires references between objects (foreign keys onto existing data, like mapping Companies and Opportunities in Salesforce) that aren’t necessary in the ELT world.

From a product perspective, the UX is very different as well. Reverse ETL requires a lot more user input (ex: mapping which fields to update in a tool), whereas ELT typically mirrors data using a standard schema (without much user customization involved).

We are close partners with ELT tools like Fivetran, and you can see our partnership post here: https://fivetran.com/blog/fivetran-partners-with-hightouch-t...


Great partnership, though I have to wonder if Fivetran will ever build this functionality out in their core product (though when I asked them to consider this a few years ago I think they politely shoved that suggestion into the trash can, glad you have proven out this space). I think you are quite right that this space is going to be huge, wish could invest in you!


Reverse ETL is so critical as tools proliferate. How do you deal with visualisation of data in the environments that they are consumed in? Or is that upto the respective environment?


If you don't mind me asking, how much time did it take to reach this maturity or quality as a product? Did you start in 2019 or before that?

Kiss your designer for me.


we started building this one August 2020 but honestly just had a lot of fun working on the design and UX! Conveying your feedback to him now!


Congrats Kash, Tejas, Josh, and your whole team! You guys are killing it.

You’ve made incredible progress over the last year. Your customer list is looking very strong, and it seems like you’ve honed in on a real and pressing problem.

Keep up the great work, but try to take a moment to celebrate how far you’ve come!


Thank you for the kind words :) We are just getting started!


Are posts by YC companies handled by an API that submits at around 7am PST?


No. I tell the startups who are doing a Launch HN to post whenever they're ready in the morning.


what about the latency of data warehouses? how do you get around that?


Good callout. Sometimes, I joke that warehouse ingestion latency is the bane of my existence, but it's improving...

Our average customer runs Hightouch syncs roughly every hour, but we can actually run syncs up to every minute! HT has a lot of optimizations like only sending changes to destinations instead of all data every run.

On the warehouse side, we're seeing a lot of improvements. BigQuery has streaming insert APIs [0] implemented with a parallel database on the backend that's joined at read time. Combined with timestamp partitioned tables (sortable) and our in-warehouse diff'ing, you can actually create a streaming pipeline in Hightouch. Some companies like JetBlue are doing cool stuff with lambda views on top of Snowflake [1]. Our power users at Hightouch are running syncs as fast as every minute.

For wider context, we find 90%+ of business use cases to be just fine in batch. It's amazing to see how many people are still replacing... manual CSV workflows... with Hightouch :)

That said, there are some use cases for truly real-time workflows (e.g. a post-checkout email), and for that, customers either implement outside of Hightouch or lately, we've been fiddling around with letting customers plug directly into streams like Kafka, Kinesis, PubSub - though they lose the power of SQL aggregations _for now_.

Streaming SQL databases like Materialize [2] will fix this fundamentally, and Hightouch can connect to them. Email hello@hightouch.io if you want to try any of the new stuff!

[0]: https://cloud.google.com/bigquery/docs/write-api [1]: https://discourse.getdbt.com/t/how-to-create-near-real-time-... [2]: https://materialize.com/


This looks super cool. Love the git integration.


Thanks! Here's some more info on the git integration if you're curious: https://hightouch.io/blog/announcing-git-sync-back-your-sync...


Congrats on the launch! Hightouch looks great and this need is real. Things seem to be going well, so I don't think I'm taking too much away by mentioning that we have been been working on Grouparoo, an open source alternative that solves similar pain points.

A few differences: git developer workflow focused (branches, CI, PRs, etc), ability to self host, segmentation in destinations (tagging people in mailchimp based on rules, for example)

https://www.grouparoo.com


Hightouch user here. HT actually has a lot of that - git integration [0], visual segmentation [1]. Not sure about self-hosting though. Open-source is cool, will check it out.

[0]: https://hightouch.io/docs/integrations/git-sync/

[1]: https://hightouch.io/docs/hightouch-audiences/overview/


Haha thanks. Love some friendly competition :). In all seriousness, though we're focusing elsewhere, the OSS angle is cool.

If you're interested in self-hosted though, just reach out at hello@hightouch.io.

That said, IMO one of the coolest parts of our tech is our "hybrid architecture". Out of the box, no data is stored in Hightouch - it's all in your cloud (warehouse, s3 bucket). This is how fintech (Plaid, Blend, Betterment, + some banks now!) and healthcare brands like Headway use us. We've also done a ton of compliance work and have certificates for SOC2 Type II and whanot.


There are probably some nuances one level down. Things our users have told us they can do in these areas that, to my knowledge, Hightouch doesn't do:

* Combine data from different sources to define a model. We'v seen using Postgres as a source of truth and supplementing with Snowflake data, for example.

* Add tags to contacts in mailchimp, zendesk or make lists of them in customer.io, Pardot, etc based on segmentation. I believe Hightouch Audiences is more like a filter.

* Full workflow with branches, PRs, test suite in a repo. I saw Hightouch added git syncing to a known branch yesterday and it looks cool, but it's not the full workflow yet.

I'm certainly trying to keep it in the friendly-competition area, especially on this thread :-)


This probably isn't the best place for an extended comparison, but since it's our launch post, I'll try to close the thread with a couple corrections for factuality. If anyone is interested in a deep-dive, email hello@hightouch.io, and I'm happy to set one up personally. And, I'm sure the team at Grouparoo would be willing to do the same ("contact us" at bottom of their website).

    * Add tags to contacts in mailchimp, zendesk or make lists of them in customer.io, Pardot, etc based on segmentation. I believe Hightouch Audiences is more like a filter.
With static mappings, audiences can be synced to destinations as tags :). The magic is in the abstractions, not features!

    * Full workflow with branches, PRs, test suite in a repo. I saw Hightouch added git syncing to a known branch yesterday and it looks cool, but it's not the full workflow yet.
Lots more coming soon here. Our git integration is bidirectional so you can totally do that stuff in git, but UI support is on the way. We've found the UI experience is a lot better of an experience than code for _most_ Reverse ETL workflows... so I see the value in this - I'lll check it out

If I have to be honest, the biggest thing that customers love about our product is that it works and accomplishes their use cases. Platform features are cool, but from time to time, I have to remind myself that Fivetran has proven that integrations and actually working comes first, and it is volume but not _just_ volume... our philosophy (destinations as a product), design, and progress there is quite differentiated from the space. You can read more in our Series A announcement from a few months ago at https://hightouch.io/blog/series-a

PS: I haven't tried Grouparoo in a while. I do love the concepts, will give it a swing!


It's hard to leave the comparisons dangling, for sure. But I'll defer for now. Congrats on the launch :-)


Congrats Tejas and team on the launch. Great to see your progress and broad innovation in this space (Census/Hightouch/Grouparoo/us@RudderStack).

It's a huge market and we can all help push each other.


Thanks, Soumya! I agree. Congrats on all your continued success at RudderStack as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: