Data-Driven
In a 2015 report by KPMG titled Data-driven Business Transformation: Driving Performance, Strategy and Decision Making, they point to the fact that the phenomenon of big data has changed the business world like never before. Organizations now have the opportunity to use data and analytics to become less process-centric and more data-centric — and therefore data-driven when it comes to strategy and decision making. While most people focus on the technology, the best organizations recognize that people are at the center of this complexity. In any organization, the answers to questions such as who controls the data, who they report to, and how they choose what to work on are always more important than whether to use a database like PostgreSQL or Amazon Redshift or HDFS.
What is a Data Scientist?
Culture starts with the people, and their roles and responsibilities. And central to a data culture is the role of the data scientist. Data scientists need expert level skills in Mathematics computing skills, including programming and infrastructure design and finally, they must be able to communicate. They should be able to integrate the results into a larger story and recognize that if their results don’t lead to action, those results are meaningless. These Days many organizations are coming forward with C-Suite roles in the field such as CDO/CDS. The CDS/CDO is responsible for ensuring that the organization is data-driven and it becomes an important role as the size of the organization increases.
Road To Data-Driven Organization
A data-driven organization
acquires, processes, and leverages data in a timely fashion to create efficiencies, iterate on and develop new products, and navigate the competitive landscape.
Acquiring and Processing
The best data-driven organizations focus relentlessly on keeping their data clean. The data must be organized, well documented, consistently formatted, and error-free. Cleaning of Data requires roughly 80% of the work and setting up the process to clean data at scale adds further complexity. Many times organizations heavily spend on Data Processing to create a vault of data that rarely gets used. Some of the best organizations use the data to understand their customers and the nuances of their business and develop experiments that allow them to test hypotheses that improve their organization and processes.
Democratizing Data
The democratization of data is one of the most powerful ideas to come out of data science. Everyone in an organization should have access to as much data as legally possible. One challenge of democratization is helping people find the right data sets and ensuring that the data is clean. To help employees make the best use of data, a new role has emerged: the data analyst. The analyst’s mandate is to ensure consistency and quality of the data by investing in tooling and processes that make the cost of working with data scale logarithmically while the data itself scales exponentially.
What Does a Data-Driven Organization Do Well?
One of the most important distinctions between organizations that are data-driven and those that are not is how they approach hypothesis formulation and problem-solving. Data-driven organizations all follow some variant of the scientific method, which we call the data scientific method:
- Start with data.
- Develop intuitions about the data and the questions it can answer.
- Formulate your question.
- Leverage your current data to better understand if it is the right question to ask. If not, iterate until you have a testable hypothesis.
- Create a framework where you can run tests/experiments.
- Analyze the results to draw insights into the question.
Managing Research
Once we have a sense of the problems that we would like to tackle, we need to develop a robust process for managing research. Here’s a set of questions that can be asked about every data science problem:
- What is the question we’re asking?
- How do we know when we’ve won?
- Assuming we solve this problem perfectly, what will we build first?
- If everyone in the world uses this, what is the impact?
- What’s the evilest thing that can be done with this?
> This question is a bit different. By asking the team to imagine what their impact could be if we remove all constraints, we allow for a conversation that will help us identify opportunities that we would otherwise miss, and refine good ideas into great ones.
Designing the Organization
Should the data team be centralized or decentralized? Should it be part of Engineering, a product group, or Finance, or should it be a separate organization? These are all important questions but don’t focus on them at first. Instead, focus on whether we have the key ingredients that will allow the team to be effective. Here are some of the questions we should ask:
• What are the short-term and long-term goals for data?
• Who are the supporters and who are the opponents?
• Where are conflicts likely to arise?
• What systems are needed to make the data scientists successful?
• What are the costs and time horizons required to implement those systems?
We constantly needed to rethink and reevaluate our organizational structure to provide the best career growth and impact. However, we should always have one central tenet in mind: to grow a massive company, every part of the organization must be data-driven. This means that the data would be fully democratized, and everyone would be sufficient data proficient. Naturally, we would still need those with a specific skill set, but data would become an intrinsic skill and asset for every team.
Daily Dashboard
On the road to Data-Driven Organization, we should look at our data every morning. Starting every day with a review of the data isn’t just a priority, it’s a habitual practice. The simplest way to review the data is by looking at dashboards that describe key metrics. Few key points when creating and managing dashboards:
- Data vomit: High density of Data on the dashboard is bad and lead to frustration
- Time dependency: Put data on the dashboard only if you know what you will do if something changes.
- Value: Review them and ask whether they are still giving you value
- Visual: Make our data look nice
- Fatigue: We like to create alerts when something changes. But if there are too many alarms, we create alarm fatigue.
As a general rule of thumb, we like to ask four questions whenever data is displayed:
- What do you want users to take away?
- What do you want users/teams to take away?
- How do you want the viewer to feel?
- Finally, is the data display adding value regularly?
Metrics Meetings
One of the biggest challenges an organization faces isn’t creating the dashboard, it’s getting people to spend time studying it. Various models are being used to make employees look at dashboards and use them but most of them failed. One of the best models that worked is SSR(Sustained Silent Reading): Instead of assuming that people looked at the data on their own, we spent the first part of the meeting looking at the data as a group. During this time, people could ask questions that would help them understand the data. Everyone would then write down notes, circle interesting results, or otherwise annotate the findings. At the end of the reading period, time was dedicated to a discussion of that data.
Data Failures: One word of caution: don’t follow the data blindly. Being data-driven doesn’t mean ignoring your gut instinct. This is what we call “letting the data drive you off a cliff.” How can we prevent these kinds of catastrophic failures? First, regularly ask “are we driving off a cliff?” By doing so, you create a culture that challenges the status quo. When a person uses that phrase, it signals that it’s safe to challenge the data. Everyone can step back and take into account the broader landscape.
Standup and domain-specific review meetings
Standup Meetings: These are short meetings (often defined by the time that a person is willing to stand) that are used to make sure everyone on the team is up-to-date on issues. Questions or issues become action items that are addressed outside of the standup meeting. At the next standup meeting, the action items are reviewed to see if they have been resolved, and if not, to determine when the resolution is expected.
Domain-Specific Review Meetings: It’s also important for the data team to hold a product review, design review, architecture review, and code review meetings. All of these meetings are forums where domain-specific expertise can provide constructive criticism, governance, and help. The key to making these meetings work is to make sure participants feel safe to talk about their work. During these meetings, definitions of metrics, methodologies, and results should be presented before being deployed to the broader organization.
Tools, Tool Decisions, and Democratizing Data Access
There are a few attributes of tools that both are timeless and enable stronger teamwork:
- Powerful
- Easy to Use
- Support Teamwork
- Community
When we are looking for data democratization as an organization, we have to choose tools that are easy to use as we don’t want to force the users who need data to go through channels; train them to get it themselves. Most data solutions are evaluated on speed but when we are supporting large numbers of users, raw speed may not be relevant. Almost anything will be faster than submitting a request through data analytics staff. Some important questions to ask when opting for a tool:
- How well will the solution scale with the number of concurrent users?
- How does it scale with the volume of data?
- How does the price change as the number of users or the volume of data grows?
- Does the system fail gracefully when something goes wrong?
- What happens when there is a catastrophic failure?
Reference: https://www.amazon.in/Data-Driven-DJ-Patil-ebook/dp/B00SXHFTAS