Making Data Science Work
Starting a tech company and building a good product has become easier thanks to improved connectivity, the declining costs of cloud storage and computing, and the accessibility of distribution platforms that can reach large target audiences. These low barriers to entry have caused tech companies to proliferate. This growth is producing huge amounts of data, which has led to the development of multiple new data disciplines focused on how to use data.
More than ever, companies with strong data-informed cultures that leverage their data to drive outcomes have a competitive advantage. A data-informed culture that drives success begins with company leadership emphasizing the right value of data and hiring and developing top data scientists.
In our recent series on data science, we have taken a deep dive into the importance of data-informed cultures for building a successful company. We discussed the importance of a high performing data science team and the evolution of this team as your product and company mature. We also discussed how to set up your company to generate the best return from your analytics teams, and how this necessitates building world-class teams, and how defining the role of a data scientist properly, hiring the right people, providing the right opportunities to progress and managing data scientists are crucial to developing the greatest competitive advantage from your data team.
In this last post, we’ll take a look at how all of these factors come together to build a successful company.
Setting up a company for success is paramount. Companies with the strongest data-informed cultures exhibit the following six characteristics:
- Focus on impact — Impact is the currency for all analysis. Without defining, measuring and executing toward impact, companies are akin to ships without a rudder. Impact helps prioritize and sets direction.
- Agile infrastructure: A good infrastructure is agile and adaptable for future demands. Companies whose infrastructures are only optimized for short-term needs generally fail to scale. Without thoughtfully constructed infrastructure, teams will be, at best, delayed in their ability to understand phenomena impacting their business, or at worst, not be able to understand them at all.
- Culture of experimentation: While intuition is incredibly valuable for building products, it does not scale. Data-informed companies need to establish a strong “test and learn” culture, codifying intuition into hypotheses and experimental design. The underpinning of this test and learn methodology is that small improvements every week compound in a way that results in much greater impact than large gains that occur infrequently.
- Expansive view of the analytics team: To build a truly great data-informed company, analytics must be involved at every stage of product development and be embedded within the product team. At the outset, the analytics team should help craft the relevant metrics for a product’s success, measure progress continuously and help identify risks and growth areas for the business. It is important to use the analytics team for the highest leverage problems and have them produce the greatest impact — ensuring that they can help drive goals, roadmap and strategy for the product.
- Rigorous roadmap process: Fast product development, testing and iteration requires an efficient roadmap process. The process must include the involvement of analytics at all stages of product development, including setting the roadmap. Without a sound roadmap process for fast product development, execution will be limited.
- Hire well and empower: Product analytics is a nascent function that is still evolving. Both data science and data engineering are continuing to find their feet and refine their vision in even in the most advanced companies. As a result, building a world-class team is harder in analytics than in other functions because there are fewer experienced leaders. Three key dimensions are most valuable to consider when building a world-class organization — people, culture and process. Driving impact by hiring A+ players, empowering them to do great work, and mentoring them as future leaders is essential. To do this, companies and teams need to have each person operate with purpose; cultivate a bottom-up culture that empowers people to high levels of excellence; build a strong, transparent organization that has accountability, ownership and trust as its core values; and create a culture that focuses on company first, business unit second, team third and individual last.
Primary Outcomes of Data
For today’s companies, the ability to compete is measured by how successfully analytics are applied to vast, unstructured data sets across disparate sources to drive product innovation. Therefore, data scientists are in high demand, and a team of smart data scientists can make or break a product. This increasing interest in mining data for insights has led product teams to use data to focus on four specific outcomes.
- Evaluate the health of the business: One of the key outcomes of product analysis is to evaluate the health of a product or a business. Once product success has been defined by the means of a goal and a metric, the next step is to monitor the metric to ensure that the goal will be hit. Tactically, analysts work on identifying outliers, understanding the drivers of changes in metrics, and building dashboards, reports and visualizations.
- Ship the right products and features: Another important role of analytics is to ensure that the right products and features get built. Many companies run numerous experiments and ship products after evaluating the results of these experiments. Typically, data scientists help design experiments, identify data-informed hypotheses on phenomena, and guide product teams on constant optimization of the product through data insights.
- Forecast outcomes and power production systems: Another role of data scientists is to build prototypes and models and to power production systems using AI and machine learning. These data scientists train machine learning models on a phenomenon in order to forecast future expectations and trends.
- Set roadmap and strategy for the product: Deeper exploration and analysis of the user journey and phenomena generate actionable insights that ultimately result in setting roadmap and strategy for the product. Data-driven roadmap and strategy is one of the most important outputs of a product analytics team.
These four primary data outcomes have generated new products and an entire industry that focuses on multiple parts of the data stack (see Figure 1 below) to maximize generation of insights and the building of amazing data-driven products.
As the data stack has standardized, multiple data-related professions have emerged, including data analysts, data engineers, data infrastructure engineers, data architects and data scientists. Companies are at different stages of development with respect to their maturity (see Table 1). Some are still just counting numbers, making sure, for instance, that they accurate tally the number of active users that visit their site. Others build dashboards to evaluate the health of their business. Companies that are more data-driven use experimentation to ship the right products, and the most advanced also drive their goals, roadmap and strategy for their products using data.
The four outcomes of data have also led to two different types of data scientists in the industry — product analysts and algorithm developers. Product analysts deliver data-informed stories that advocate for a change in product or strategy while algorithm developers incorporate data-driven features into products, such as optimizing recommendations or search results.
Data scientists are either new graduates at the bachelors, masters, MBA, or PhD levels or experienced professionals in product, technology, or consulting. They have different skills, and it is valuable to have a good mix of them to solve a variety of problems. Generally speaking, there are three types of skills that data scientists provide: scientific rigor, consulting mindset and programming strengths, all of which are applied to the five problem solving and analytical dimensions below.
- Problem formulation: Data scientists must be able to formulate and structure problems. This generally requires a consulting mindset as well as a scientific approach to problem solving.
- Technical ability: Programming and scientific skills are both required to extract data.
- Analytical ability: Analytical skills are required to extract and manipulate data sets, and to extract value from the data in the form of tables, charts, etc. A consulting mindset and scientific approach to problem solving are necessary to make sense of the data.
- Synthesis: Data scientists need to interpret, simplify and synthesize the results of their analyses. A consulting mindset is valuable here.
- Influence: Influencing product decisions by using data to drive storytelling is important for creating impact and requires a consulting mindset.
Because data scientists solve a diverse set of problems, multiple archetypes of data scientists have emerged. These include:
- Product generalists, who are generic problem solvers working across a wide range of product issues.
- Early product analysts, who determine product market fit for nascent products.
- Growth analysts, whose job is to move metrics.
- Core marketplace analysts, who ensure healthy liquidity on a platform.
- Ecosystem analysts, who identify competitive threats and strategic opportunities.
- Machine-learning analysts, who ensure healthy operation of the algorithms that power a product.
A one-size-fits-all approach to hiring a data scientist does not work. It is important to consider the size and maturity of the data organization, the needs of the product team and the related problems data scientists will solve. A good hiring process can be built around evaluating the skills needed for each of the data archetypes.
Hiring a generalist: An interview loop for a generalist should have two analytical decomposition cases, one programming test, one applied analysis and a scientific and quantitative interview.
- Analytical decomposition cases: This is the most important interview. If someone fails it, they should not be hired. The importance of these skills merits two interviews that test the following: problem formulation, communication and clarity, raw analytical ability, product mindset, product success and health.
- Programming test: This tests candidates’ coding skills. The programming test should include simple data acquisition (writing simple programs to acquire data) and complex data acquisition (joining disparate data sets to acquire complex data sets) to ensure that the candidate’s programming skills is not a liability.
- Applied analysis: This is useful to understand whether a candidate can solve a real problem from end to end. It tests whether the candidate can formulate the problem, acquire and manipulate data and perform synthesis.
- Problem formulation: Can a candidate understand how to solve business questions by formulating the problem? This includes writing simple queries to acquire data, manipulating data based on the business problem, and synthesizing the results to communicate them clearly.
- Scientific and quantitative ability: This interview assesses whether a candidate has the necessary quantitative ability, especially with respect to math; whether they can make sound decisions based on statistical analysis; and whether they have the scientific skills needed to analyze complex data.
Data scientists in earlier stages of their career have the greatest impact on specific projects. As they progress, their impact grows from a specific product to an entire domain. The most senior data scientists impact the entire company.
At different stages of their careers, data scientists have varying levels of proficiency within five core skills and abilities that we outline in the post on career progression of a data scientist: problem formulation, technical ability, analytical ability, synthesis and influence. As a result, more junior data scientists require support and assistance from senior data scientists (as well as their managers) to ensure excellence. Generally speaking, the more senior a data scientist, the greater their impact. When a data scientist has company wide impact (see Figure 2), they have much greater impact than at the project level.
Tables 2 and 3 describe the profiles of a junior and senior data scientist. The primary difference is the level of independence by which a senior data scientist is able to influence high quality work across the company.
A strong data science leader is required to scale the impact of a data science organization. The two most important responsibilities that a data team manager has are driving impact and building a world-class team. A manager needs to build and retain a high-performing team that drives sustainable impact by producing high-quality work and building great products. Broadly, a data team manager focuses on five things:
- Driving impact: Is the manager able to drive the greatest impact on the team?
- People management: Does the team have the right people? Are they happy? Are they being mentored and developed?
- Driving excellence: Is each person on the team maximizing their impact?
- Scaling the team: How does the team need to evolve over the long term?
- Product leadership: As a manager, are they providing the best direction for building products?
Trends and the Future
Data science is a field that is still in its infancy and will undoubtedly evolve over time, impacted by the many emergent trends that will continuously shape the field. The behavior of consumers and enterprises is constantly shifting. For example, increased access to bandwidth around the world has resulted in a continuous increase of video consumption. On the enterprise side, the prevalence of cloud computing and the explosion of SaaS applications has led to the atomization of enterprise applications.
As with most fields, new products, likely driven by artificial intelligence and machine learning, will emerge that automate mundane tasks of data engineers and scientists, pushing the entire field up the stack towards more creative and higher impact problems. The net result of this will be increased specialization among data professionals — from the infrastructure level (for example, expertise in time series database is different from real-time) to top-of-the-stack analysis where business context will be as important as mathematical ability (for example, analytics to manage risk vs growth). The current visualization and reporting will also evolve to become more story-telling and decision-oriented.
As a result, at the company level, being flexible is key. A thoughtful building of the data infrastructure stack with the future in mind, hiring the right data talent at different stages of the product life cycle, and constantly evolving the future of the discipline will contribute to your company’s success.