Why is data hard?
One of the types of projects that Strategy gets pulled into is the difficult project. The very cross-functional, complex, hairy project. Data and getting to metrics that matter is one of those projects.
Wait, data is hard?!
When most organizations think about data, they think about metrics and using these metrics to surface insights, make data-driven decisions, and continue to monitor the health of the business. This sounds like we should be able to hire smart and capable analysts, create some visual dashboards, and be off to the races!
“Every second of every day, our senses bring in way [more] data than we can possibly process in our brains.”- Peter Diamandis, Founder of the X-Prize
Having lots of data doesn’t make it immediately valuable. And when you’re scaling as fast as Slack, not only is leveraging data and metrics well critical to effective scaling, but it is also that much harder because you are “building the plane as it is flying”.
The data stack: getting to metrics that matter
The data stack can roughly be grouped into four main levels, and each piece is extremely dependent on the success of other pieces in the stack.
Most business owners and company executives think about data at this level. Insights are the stories we tell that make data matter — what moves the business, what new opportunities exist that can drive a ton of growth.
In the ideal world, there is a shared, continuously evolving data narrative around the performance of the business. This data narrative is circulated across the organization to build a shared understanding of the business.
Exploration and Tools
To actually achieve insights, we need to empower many people across the company to explore data on a regular basis. Curation and stories only happen when someone is looking at the data!
Optimal data exploration that scales well with a fast-growing business involves a few key things:
- Broad data access. To really build a shared understanding and intuition of what’s happening and what is important, we need everyone to feel some ownership over looking at and exploring the data. The reality is that if exploration is hard, only the power users (analysts) are able to do this work. You can either hire more analysts to deep dive on building insights, or, you can find ways to simplify data access and make teams increasingly self-serve. Slack lives somewhere in between — we continue to look for ways to increase self-serve data across the organization, but also make sure that we have great analysts partnering with each of our core functions.
- Regularity of use. Like all good habit-building, consistency of looking at data and metrics is the only way to build intuition around what is expected, what is unexpected, and what are the right questions to ask of data. Analysts can help dig into trends, but some trends are worth digging into and many just aren’t. If business owners are looking at data regularly, it’s much more likely that your analysts will be optimally allocating their deep dives.
Example: You see an increase in active users +4% this week. Is that good? Is that bad? A slowdown from what is normally-expected growth? Did we launch something new this week, so actually we anticipate something higher than the normal week over week rate?
An analyst can help dig into various comparisons to help business owners contextualize the rates, and should. Analysts can look at how this rate compares year over year, they can deep dive into the composition of these new teams and where they came from. But maybe 4% is in line with what you, the business owner expected. It is lower than usual but we haven’t launched anything new and we’re in a slower holiday period. That’s really the intuition that you want business owners and analysts alike to build. You don’t want to burn cycles digging into something that won’t necessarily move the business forward, or change what we decide to do.
- Discoverability and data exploration. Data exploration is different from accessing dashboards, and I want to call that out here. Dashboards are created with a concrete set of requirements, usually reporting some view of the metric or world at specific levels of granularity on a regular basis. Data exploration is the ability to investigate metrics across a variety of cohorts and/or correlated metrics to identify trends or opportunities that are not immediately obvious in the fixed dashboard view. Think about this as being able to pivot and filter the data, to ask questions of the dataset beyond the high-level performance or monitoring. See a spike in active users? Great! Maybe we need to explore whether that’s consistent across all countries or isolated in the UK. Did we launch a UK specific campaign that week? Did the Sales team have a killer deal that closed that week?
The closer the business owners are to the data and the more able they are to embark on self-serve exploration, the quicker and more efficient key insights can surface. This is because they are more able to combine the understanding of what we are doing in the business with visibility of how it is likely manifesting in the data. The opposite is also true! Analysts who have a lot of business context from their business partners can be much quicker with coming to the right insights without having to boil the hypothesis ocean. For a fast-growing organization, you probably want both to exist in your organization so that everyone feels ownership over understanding what our biggest opportunities and gaps are.
Metrics and Dimensions
There are a lot of data-driven decisions that can happen at the Exploration and Tools level. But what we’ve found at Slack is that if data is not easy to understand, clear, or trusted, you won’t get wider spread data exploration even if the tooling is available. This is where consistent, well-understood, and clearly defined metrics and dimensions are crucial.
Data may not be trusted for a variety of reasons. Maybe it isn’t always available (missed SLAs). Maybe the data is wrong (data loss, and tracking issues). And maybe it just isn’t clear what the data represents and how to use it. In those situations, it’s less about the integrity of data itself but how much time it takes for data consumers to check and QA their work, before they are even able to start drawing insights.
Example: At Slack, we have two different ways of thinking about Geography as a dimension. Geography is important if you want to look at user data and understand how our international campaigns and product launches are performing, or where there is more opportunity for us to double-down.
The way we have Geography defined is both 1) by a team and user’s IP address (where they are using Slack), and 2) by a team’s billing address (usually company HQ).
We have good reasons for having both definitions. When we think about product features and launches, what we really care about is where the product is actually being used. IP-based location is absolutely what we want to be looking at. Conversely, for financial reporting and measuring where our revenue comes from, we really want to understand the data by billing address.
However, a new user of our data might (1) get confused by which version of geography to use, or (2) use one version but see reports and numbers from a different version that don’t match up. In either case, this new user is likely to stop trusting your ability to explore the data or lose trust in the data itself.
There are different ways to solve this. Often, the most impactful and easy solution is not technical. For us, it was being clearer with our labeling and documentation for each metric / dimension definition. We also work to educate the company on which to use and under what circumstances — a key function that a centralized analytics team can support when partnering with stakeholders and business partners.
Across the entire company, it is important to be extremely clear about what the key metrics and dimensions are for the business and how those metrics and dimensions are defined. Are these obvious to the average data consumer? Can we make it easier to understand via documentation, training, and labeling? Are we using and talking about metrics and dimensions consistently across our teams?
We’re still working through a lot of this now, because there are always ways for an organization to iterate and make this better. Even more so when you continue to grow, add new people to the team, and launch new products that you need to measure!
Underneath all of this is the data infrastructure that allows everything above to be stable, reliable, and accessible even at 100x scale. Data infrastructure is the plumbing that connects everywhere across the product, piping out the raw information that we need to understand what’s happening across the organization. Once you have the metrics, the dimensions, and the definitions that you need as a business to monitor, explore, and drive insights from, you need to ensure that you are actually gathering the raw data to support these from everywhere across the product.
The infrastructure underneath a seemingly simple metric usually involves:
- Instrumentation: Raw data collection from the product. This data collection is typically accomplished via a partnership between product teams and data engineering.
- Aggregation: The raw data needs to be aggregated into a clean, consistent, reliable view of the world. This is often implemented by data engineering with some contributions from analytics.
- Metric Logic: Finally, metric logic is applied on top of aggregated data to generate business metrics; this is again a collaborative effort between data engineering and analytics; often primarily driven from the analytics side.
Without the infrastructure, there is no data to look at. In some cases, you have to add new instrumentation as data requirements evolve. What if you launch a new product feature and want to measure adoption of that feature? Is there data collection in place that can capture what you’d like to know about adoption? Are we able to aggregate that data at a level that you can use for those measurements and insights? Infrastructure is the lifeblood that makes it all happen.
Data Stack Feedback Loops
Across all 4 of these levels, successful data and insights come from constant feedback and a very close relationship between engineers on the backend, the end-users of data, and everyone in between. There always needs to be continuous iteration and feedback up and down these levels. This is even more important as the company changes, grows quickly, and continues to find new ways to drive the business forward.
Fast growth: where the product outpaces the data
In practice, data is hard because it involves very cross-functional teams, and a lot of “under the hood” work that most people in the company don’t realize needs to happen. Insights are just the tip of the iceberg.
Every fast-growing company can run past its data stack. What does this look like?
- An increase in the complexity of go-to-market initiatives. Launch a sales team. New marketing campaigns through new vendors. Systems are brought in to support these teams without necessarily having a clear tie-in back to the data stack.
- An increase in product features, or just product lines. Launch a new Enterprise product. Internationalization.
If infrastructure isn’t in place to collect all the relevant data, or if there are too many systems that don’t talk to each other and create their own silo-ed datasets, you get to a place where data is tech debt. This is not unusual when teams are moving fast and focusing on delivering their objectives.
So, how do you start to pay down that tech debt? Or, why is Strategy involved?
Remember the data stack. There are more dependencies and foundational work than the average business owner understands. Paying down tech debt means building capacity to do that foundational work. What that means for data projects in general is making company education and buy-in an important piece of the work. That looks like:
- Deeply understanding the situation, and communicating to the organization the state of the world and why we are where we are.
- Crafting a narrative for what the data strategy is by building a vision for what good looks like in 12–18 months, and laying out a plan to get there, the resources and time needed.
- Getting all the data teams aligned on the above, and create a process and forum for continuing to surface these very cross-functional projects and priorities so that global prioritization can happen (what we like to call #data-XFN).
Easier said than done! But, we knew that. Data is hard. But at scale, data done right will empower more teams to understand the business and make decisions.