The modern data platform is too big to fit on one slide
Mary Meeker just released her 2016 Trends Report. As usual, it’s required reading.
The last section of the document was about data. This slide summarized her outlook on industry trends:
There’s plenty to like in here, but I want to poke at something I don’t agree with. Mary seems to be forecasting consolidation within the software layer of the stack. What was previously comprised of multiple interlocking layers is now consolidating into organization-wide analytics platforms and departmental applications. As someone watching the modern data platform evolve, this is the opposite of what I’m seeing.
What I’m seeing today is increasing composability and focus within a more diverse array of data products. A modern, mature analytics stack involves many components, most of which are separate but interconnected technologies:
- ETL (RJMetrics Pipeline, Fivetran, Segment, Airflow, Luigi)
- Data storage (S3, HDFS, GCS, Redshift)
- Data processing (BigQuery, Presto, Redshift)
- Interactive analytics (everything from Mode & Looker to Python, R, and Jupyter)
- Collaboration (Github)
- Documentation (Gitbook)
- Testing (a hot topic at conferences with presentations from Netflix and Microsoft but no products that I know of yet)
- Modeling (dbt)
- Algorithms (scikit-learn, tensor flow)
- Visualization and interactivity (d3, matplotlib, ggplot)
- Domain-specific data applications (Inside Sales, Gainsight)
These are just the biggest categories. There’s other novel stuff going on around the edges: data enrichment is more accessible, embedded analytics is getting interesting, and we’re sharing huge datasets with each other and compiling libraries of public facts. There will be entirely new layers that we haven’t imagined yet.
Department-specific applications and organization-wide analytics platforms are real, and they’re a part of the story, but they’re just one part of a much bigger picture. The data stack is branching out like Darwin’s tree of life: each layer diversifying to fill its own ecological niche. This trend is empowering for users and the businesses they work for.