From Seoul to New Orleans (not a food trip) — Coalesce 2022

Published in

Kaldea

8 min readNov 15, 2022

Written by: Jordan Bae (guest writer and data engineer @ Chai Corp)

From Seoul to New Orleans

Did I go on a food trip? Yes, kind of but not really. Last month, I had the privilege to join the world’s analytics engineering community in New Orleans at Coalesce 2022 — The Analytics Engineering Conference in New Orleans from October 14 to October 21.

dbt conference — DBT Coalesce, is worth it

For those of you who are not yet familiar, Coalesce 2022 is an Analytics Engineering Conference hosted by dbt Labs.

Coalesce 2022 had three main parts.

1) Coalesce New Orleans

2) Coalesce Online + London, Sydney

3) Coalesce Online

Coalesce New Orleans was the main event where you could attend all offline sessions and networking events over 5 days. The London and Sydney events were held offline for 2 days alongside online sessions. I took the lunge to fly out to Coalesce New Orleans because I wanted to feel the actual vibe on site. Was it worth the trip? Yes, certainly. The networking alone was worth it, but there was much more. Coalesce this year had over 100 sessions, and workshops, and more than 10,000 people participated online.

Session recaps from Coalesce 2022 New Orleans

Keynote: The End of the Road for The Modern Data Stack You Know

On the second day of the event, 2022.10.18, Tristan Handy, Founder & CEO of dbt, shared his keynote. It began with a story of how the dbt community started and gained momentum — what started as a small meetup in New York now has 40,000 Slack members; and with people from 96 different countries are participating in Coalesce 2022.

What makes dbt so special? Tristan Handy answers with this slide.

In short, dbt has “resolved the governance issue with the data knowledge in an incredibly easy way.” That’s how I felt it when I was personally using dbt. It wasn’t just a transformation tool, but a service that made it very easy to mange knowledge of SQL for each unmanaged metric and what the Transformation Layer needed (e.g. test, development environment, version management). For example, even with the same metrics, the query used by each analyst and data engineer can often be different, and they are often unsure which one is the right one. dbt solves these problems easily.

The next topic was dbt’s focus on Ecosystems from the beginning transitioning into his talk on the Modern Data Stack.

Showing the list of modern data stack ecosystems (above), he explained the main topic of the Keynote — The End of the Road for The Modern Data Stack You Know.

Previously, Velocity and Governance had opposite tendencies. It used to be that you had to choose between Slow and Govern or Fast and un-Govern. Today, the role of the Modern Data Stack is to change the choices of one or the other to both and better, Fast, faster, and govern. As a data engineer, I couldn’t agree more that such an option is what I want and what I expect from the continued developments in the data industry. This is a dilemma most of us data folks were faced with the past few years, and in large today. Once you choose the Fast and un-Govern strategy, the organizational inertia built around it does not allow you to easily come back and pick up the governance piece, and vice versa, therefore I think why dbt took off.

Why is dbt special? Community!

If you ask why dbt is so special, I’d say it’s because of the community. In fact, alternative data services that solve both speed and governance already exist. However, few services have such a strong and large community as dbt. The size of the dbt community is enormous. I was amazed that its Slack Community has over 40,000 people. This is a lot more than Airflow (about 27,000 people as of 10.29.2022). At Coalesce New Orleans, I was surprised once again that there were so many engineers who were enthusiastic about dbt. How was it possible to create such a community? Tristan pointed out four key points on the dbt community’s success.

dbt as a product: dbt filled in the missing piece of governance (documentation and testing) while not sacrificing speed. With it, dbt provided a scalable and sustainable system-building choice for data engineers.
Career expansion for data analysts: In a world where an engineering background was essential (e.g. Spark), dbt enabled analysts to utilize MPP data warehouse just with SQL for transformation, and highlighted analysts as the champion.
Synergy from a larger ecosystem: Multiple, readily available integration with many modern data stack services made dbt a no-brainer. When you create a model in dbt, it is transferable to most other modern data stacks.
Open source and pricing: Garnering participation from the larger development community, and providing a forever free version and a priced version (quite reasonable), what is there not to love about?

Dbt now has enough influence over the data community to mint new positions like analytics engineer. Again, dbt community felt very special. openly share information, publish content, and actively participate in the conference. This felt a bit more than just technology.

I was privileged to make some new friends in New Orleans, and am still in touch over the community Slack channel.

The new keywords in the Modern Data Stack

This is an overwhelming amount of choices involved with the Modern Data Stack, have a look at a16z’s; it’s even more complex.

I believe this is partially because data teams get set up in the later stages of the company and different situations. Data-focused teams are recruited after the organization grows to a specific size or maturity, which means companies are putting a bandaid in many situations until they cannot.

A few topics stood out to me at this conference.

Reverse ETL

Reverse ETL means that data from multiple sources ingested into the Datawarehouse is loaded back to multiple sources in reverse. In the B2B SaaS world, this is a common request you would get from your revenue/marketing/sales ops teams, to pump product data back to Salesforce, Gainsight, etc. In these cases, it is extremely easy to fail on the governance end as it gets more complex very easily. But today, there are lots of good solutions addressing reverse ETL. The illustration above shows how simple reverse ETL becomes with a solution like Hightouch. I’ve also had a time where I had to understand each different API specifications on multiple tools just to transfer a small amount of data, wish I had such services back then. Below is the list of companies that stood out to me during the conference.

Data Quality beyond Data Catalog

It seems many teams are now commonly building and utilizing services or open source related to data catalog. However, when it comes to data quality, it seemed like the majority have chosen to build a pretty simple internal tool to do minimum checks. So a lot of new tools coming to this space was music to my ears as it would save me lots of time on providing quality data and reliability when working with my counter parts.

Something I wanted to see more: improved time to insight and analytics systems

I was super encouraged by the explosion of tools in some of my pain areas today, such as Reverse ETL and data observability. However, from a company-wide view, we still struggle a lot with data on time for each ad-hoc request. There are a million reasons why this is the case, but time to insight which is a key factor of the flywheel in creating a data culture inside companies was not discussed.

Speaking to many tech leaders across the board, I felt a strong pull toward the internally built analytics systems at places like Uber, Airbnb, and LinkedIn. However, those systems are extremely difficult to replicate elsewhere because it requires a high level of engineering investment to manage them.

That excites me about companies like Kaldea, making an ambitious and bold approach to providing a unified analytics platform where all things from modeling, discovery, governance, analysis, and visualization are connected. Indeed, not an approach most startups can take as it has a steep development curve due to a large coverage area, but I am hopeful about the impact unified analytics systems can bring to companies. Deleveraging the pressure onto data-producing teams and helping companies serve data in a more timely fashion to any kind of ad-hoc requests where self-service dashboards aren’t the only answer (we know where that road ends). If you have not, check out what unified analytics platforms can do for you and your company!