What is a Modern Data Stack (MDS)?
At a grand scale, the MDS is the evolution of the old and brittle tools of broken processes from the relic archaic systems ( which earlier was called Big Data) which require consistent maintenance and QA to a Modern data system that automates, simplify and speeds the ability of companies to get their data and make strong sound business decisions.
The Modern Data Stack is a collection of different components, which involves ingestion, transformation, data storage, and BI platforms. Each component is a complete product solution in itself and provides solutions to specific problems in data processing. Thus the total scope of MDS is quite wide. For instance, ingestion tools provide extraction functionality for the system which extracts information from various sources and transformation tools help transform all that data into a proper format. All the collected data is then stored in storage warehouses and data lakes, displayed in BI tools that present data analytics results to internal and external users.
The modern data stack infrastructure collects components that bring insights into performances and congregates the data into a user-worthy form. This genesis of the modern data stack is associated with the birth of Amazon’s Redshift, which introduced the MPP format of data processing at a minuscule amount. Since then, the much smaller teams’ adoption of the data tools skyrocketed and led to a series of new products fostering the overall ecosystem’s growth.
The overall potential for the modern data stack companies is much bigger and larger since there is massive adoption of Data technologies by a broader set of companies, ranging from medium-sized to the very largest multinationals. Since the implementation of MDS technology is gaining traction the smaller companies are also looking at the importance of such technology in their organization. However, evolution from the old legacy Big data infrastructure for large enterprise companies is slow and will take some time to mature and benefit the MDS companies. The impact of COVID-19 is a positive one, catalysing the adoption of data infrastructure at a rapid pace, even for large enterprises.
The current market size estimates for the MDS companies is 65.7 billion USD as of 2020 growing at 19.4% annually
Subcategories for MDS
Several tools collectively form the Modern Data Stack. These tools form the branches for working on Data and together form the complete system. The branches of the MDS are
- Data Warehouse
- Business Analytics
- Data Lakes
- Data Quality
The former five software tools listed above are the minimum tools required to work on MDS. The other is relatively new tools that help professionals with lineage and assures the quality of data being fed into the system.
The market size of the top five sub-categories the above list forms 90% of the market and the rest are relatively new categories that were introduced in the recent year, accounts for only 10% of the market share and thus have a market size in the same proportion.
To simplify this thesis and condense it into four pages, I’ve concentrated more on the top four tools listed above for the discussion of the business model.
MDS’s Current State, Challenges and Opportunities
The MDS has come a long way from its initial days and currently in its second phase of development. This second phase of MDS consist of
- Cloud Services
- Data Governance and Quality tools
- Simplified User Interface
But still, few friction points need to be solved which are
- Lack of feedback to operational tools
- Lack of horizontal Interface for unified data interaction
- Data Steaming incapabilities
- Immature Governance
These limitations of the current state of MDS forms the groundwork for the opportunities that are yet to be tapped in this space. On top of that, there are some issues like data quality, ETL->ELT tools and tools that converge data warehouses and lake into one application that some young startups address but still hold immense potential for others to explore this specific arena.
Following are the startups that are killing it in the MDS space worldwide
- Dbt (Transformation)
- Tray.io (Data Ingestion/Automation)
- Hevo Data [India] ( Ingestion and Transformation )
- Materialize (Ingestion/Data Streaming)
- Census (BI/Analytics)
- Infoworks.io (Data Warehouse)
- Firebolt (Data Warehouse)
- Alation(Data Governance )
- Atlan [India] ( Data Governance, Ingestion, Data Quality)
Further Reading Resources/References
Technically: the analytics stack
February 16th, 2021 Not many people know this, but before Tolstoy released War and Peace in its entirety in 1869, he…
The modern data stack: past, present, and future
I recently gave a talk with this title at Sisu's Future Data conference, and since I think in prose and not Powerpoint…
The Emerging Architectures for Modern Data Infrastructure
Five years ago, if you were building a system, it was a result of the code you wrote. Now, it's built around the data…