Connected Apps | Missing Layer in the Modern Data Stack

Arunim Samat
4 min readSep 30, 2022

--

Modern Data Stack | Breakdown

Modern data stack is the buzzword that everyone is pretty familiar with nowadays but what does this stack really entails? The way we see modern data stack at broccoli is that it has 4 different components.

Data Transportation

This consists of ETL[ELT] tools like fivetran, CDPs like segment, rudderstack and reverse ETL tools like hightouch, census, the role of these tools is to transport data from source(s) to destination(s). From modern datastack’s perspective these sources are usually data applications like salesforce, marketo, zendesk, product usage events etc and destination is usually a data warehouse which is rapidly becoming a customer’s centralized data platform and thus acting as the single source of truth for customer’s data.

Data Storage

As we previously mentioned that cloud data warehouses like snowflake, bigquery, redshift etc are rapidly becoming the preferred data storage for housing all the siloed data into one place. Data Warehouses are rapidly becoming stronger, faster, cheaper and more powerful, some of them now support near time sql, real time ingestion etc. i.e this is basically the core data engine for all your data applications.

Data Transformation

In order to make sense of the data one needs to transform it, what does transformation mean? Well it simply means you clean & join data for the end consumer to start using it, this is where DBT is slowly emerging as the winner and SQL is emerging out as the choice of language to interact with your data. So pretty much any data application built on top of data warehouse is interacting with the data via SQL.

Data Application

This is the most underdeveloped area in the modern data stack, connected data applications are pretty non existent, the only few tools which are built on top of data warehouses are BI tools which while useful are a gross underutilization of the power of data warehouses. This is the area where we are going to see the most number of new startups and companies coming up trying to take on their cloud 3rd party SaaS counterparts.

Modern data stack is more than just a stack | It is your Data OS

We can also rethink modern data stack as your company’s data operating system, it has pretty much all the components needed to be an operating system where the core intelligence and processing power is your data warehouse, think of your data warehouse as your CPU with inbuilt storage and then you have SQL which is the programming language for giving instructions to the machine then just like how one is able to build applications for windows and mac, one could in theory build applications for your data warehouse using SQL.

If this is so obvious why does salesforce, marketo, zendesk even exist? Why is there no application built on top of a datawarehouse which does what any of these applications do as we know that being datawarehouse first one gets to be cheaper, boast better data governance and have way more flexibility because of a 360 view of data. We believe that there will be a day in the future where all these 3rd party SaaS data apps will be rewritten as connected apps on datawarehouse however today it’s not as trivial to do this migration, more about it in the next section.

Connected Data Applications | Biggest Challenges

The 2 biggest challenges in building connected applications today are

  • Acceptable Data Latency
  • Cost of Migration

Acceptable Data Latency

This depends on how realtime is the end use case? Do you want to perform actions as soon as an event happens i.e. real time recommendation based on user action or marketing orchestration use cases like send users a notification as soon as they visit the website. These real time use cases require real time availability of the data in the datawarehouse and subsequently the ability to perform real time SQL, although some datawarehouses are getting better and are able to give near real time SQL, it’s still not a reality yet. However, we expect realtime SQL to become a reality in the near future.

Cost of Migration

Moving from a mature 3rd party SaaS app is not easy, think about a salesforce which has been sitting at a company for a few decades now, the pain of migration is so huge that it’s not worth it as of now.

Connected Data Applications | Future

As datawareshouse tech evolves and starts to support OLTP usecases we are going to see a massive shift in how companies leverage datawarehouses. In parallel there is going to be innovation and infrastructure development on top of datawarehouses to operationalize the massive amount of data in-place. Reverse ETL solutions have shown the power of combined data but they only serve as a stop-gap solution in the present day market, the true power of datawarehouses like snowflake will be unleashed when native connected applications start becoming intelligent and that’s what we are building at broccoli labs. Check out this article on Customer Intelligence Platform if you are curious.

--

--