Scaling up the data infrastructure, Venrock’s investment into Meltano
Excited to share the backstory of our seed investment into Meltano
Data is transforming the Enterprise
The amount of data being generated and stored has exploded in scale and volume. Every aspect of the enterprise is being instrumented for data with the goal of unlocking new capabilities and building an edge. This data is beginning to rapidly transform every industry.
This transformation started with the realization that data can operationally drive and power every aspect of an enterprise, rather than just analytics. Data is allowing enterprises to respond faster to changes in their business unlike ever before, and the volume of that data is increasing exponentially. Even as we navigate uncharted inflation, COVID uncertainty, a strained supply chain, quantitative tightening, and global unrest, data allows enterprises to drive down costs, unlock new capabilities, and maintain competitiveness.
As we saw major dislocations come as a result of new mediums or computing platforms, from mainframes to servers, to virtualization to the cloud. We are now entering the cloud data era with the infrastructure at the heart of it all.
The Cloud Data Infrastructure Era
The last two years of remote work have been a tailwind for digital transformation across every sector. Every CIO and CTO is being asked about unlocking data as a competitive advantage to increase decision velocity and quality. As a result, modernizing the data platform has become a top priority. In layman’s terms, this means building a set of capabilities to move data from data generation (logs, orders, events, etc) through processing to applications or services as quickly as possible.
Companies like Astronomer* with Apache Airflow and Decodable* with Apache Flink are key services in the new data stack, with a number more services emerging to handle data ingestion, transformation, data quality, data processing, cataloging data and tables, managing schemas, and testing — all required to build a stable and performant data stack.
The way you currently build and manage a data infrastructure looks nothing like building and managing a software application. There are no isolated environments for development and testing, no code reviews, unit testing, or version control. Making updates to your data infrastructure is akin to making edits directly in production, and often leads to outages, data quality issues, and constant firefighting.
That’s until we met Meltano
Meltano spun out of GitLab’s vision for a better way to build software. It was their realization that data teams needed to operate much more like software or application teams. Using the SDLC principles to enforce high quality, performant, and secure data services and infrastructure.
Started as an open-source project, and grew rapidly in popularity with its approach to solving the data ingestion and loading challenges most developers faced. This was the start of introducing software principles for building data services. They grew to over 2500+ data professionals in their Slack community, with 1000+ organizations, and enterprises, running hundreds of thousands of active pipelines every month.
Introducing Meltano 2.0
Today, Meltano announces Meltano 2.0, a new way to build and scale up our data infrastructure, by introducing an end-to-end control plane for your data services, making it the infrastructure and devops for your modern data stack.
- Introduces a much easier way to discover, install, configure, deploy and connect various data services that make up your data stack, in an opinionated approach
- Applies SDLC principles to the entire data stack
- Provides set up in minutes, enabling you to try out new tools, easily replace old ones, and experiment locally
Now with Meltano, you can spin up a service or tool (Airflow, dbt, Great Expectations, Superset, etc), easily configure, deploy, and manage it through a single control plane. Meltano provisions the services, turns the configurations into code in Git, and version controls all changes, while enabling end-to-end testing and isolated environments for new changes. Installation and configuration now become standardized, and all the complexity of managing the various tools and services gets abstracted into Meltano.
A data team can easily spin up a new data flow with Airflow for orchestration, dbt for transformation, Singer taps for ingestion, Great Expectations for testing, and Superset for analysis, all managed by a single platform. No longer do you need to manually install, deploy, configure, manage, and stitch the services together.
As a real-world example that many of us have faced many times over; let’s say you update a schema (change the data representation) for a data pipeline, and that pipeline writes to a table that now needs to store in that new schema. Any dashboard or service that relies on the table will now break. With Meltano, data teams can deploy these changes into isolated development environments and run end-to-end testing of those changes to safely make and verify changes locally before pushing into production. If something breaks into production, you can just roll it back.
Partnering with Meltano
We have long said the way we build data services and infrastructure needed to mirror how we built applications and software. From a single pane of glass, with package managers, consistent workflows, isolated environments, end-to-end testing, and the flexibility to use the best tools for the job. From small and new companies setting up a data stack for the first time or larger enterprises moving off a legacy data stack, the future of the data stack will look a lot more like software than it does today, and we know Meltano will be at the center of it.
We’re excited to announce our partnering with Meltano in their Seed along with GV and GitLab in this exciting future.