Utilizing the Most Relevant Data to Drive Influence

9 min readApr 13, 2020

Upon joining Shopee, an ecommerce platform in Southeast Asia, I began to visualize many inefficiencies. One of these was selecting daily items for promotion on the main page. These items were handpicked at the time by hundreds of entry level hires, often without the use of historic data. I wanted to change these inefficiencies as soon as possible, but would require supporting evidence for why my proposed solution, an automated selection tool leveraging past data, would be better. Since there was no existing data on such a methodology, I would need to acquire the relevant data that aligned with the platform’s goals by running tests on one product category, which would require manpower. To get the project approved, I needed a well-designed way for us to acquire the data and a way to frame the data clearly in line with our objectives. By providing these two things, I was able to convince management to provide resources for us to build, test, and eventually roll-out the automated selection tool.

In today’s working world, decisions need to be backed by data to drive a higher likelihood of success. After you have developed a logical initiative or suggestion:

What data is the most relevant to support our decision?
How can we frame the relevant data to drive influence?

Purpose

You want to drive “Healthy Conflict” and convince management to move forward with your better alternative solutions. How can we make this happen?

Data, if utilized properly, can serve as the force to change the course of a business. We will break down the types of relevant data available so that you can control this force to create action. After acquiring the most relevant data possible, we will need to frame the data with situational adjustments to drive influence.

Considerations

In order to discuss selecting relevant data, we must first specify a few parameters — namely: Goals, The Decision, Data.

‘Goals’ are our objectives, key results, and/or key performance metrics that we are looking to optimize. At any given time, there can be multiple goals, though likely with differently levels of priority.
‘The Decision’ is a choice that directly impacts the goals, typically made about a future initiative. If the decision does not impact the goal, revise the decision and/or its options. Note that there can be multiple decisions and each decision will have its respectively relevant data.
‘Data’ is the supporting evidence that has been derived from historical results. We should structure the data such that it can be compared against past actions or choices. Collected and analyzed data should point towards a decision, otherwise, it is not relevant.

Approach

Simply put, data is all about comparisons. Data’s purpose is to tell us which option is better, so we should always be looking at how one data point matches up to other data point(s). Through these comparisons, we can then use the data to help us make predictions / better decisions about future initiatives.

Before making a data driven decision, we need to understand the possible the options. After having a firm grasp of the scenario, the first step is collecting and processing data for the closest comparison to these options, because the best relevant data is the closest comparison available. Then based on our current scenario, we may need to make some adjustments to the differences between the historical case and the future case. If these two steps are done correctly, they will drive the decision and ultimately the goal forward.

Most Relevant Comparison: What data is the most relevant to support our decision?

Now we will dive deeper into the first gear, the most relevant comparison. There are four types of sources, listed from most relevant to least relevant: Beta Testing, Internal Results, Targeted Proxies, General Proxies. For each decision, we should attempt to collect the most relevant type(s) available with our given resource and time constraints.

See below for a further breakdown on the sources of comparisons, sorted by relevance:

Beta Testing: A smaller scale version of the initiative to test and gather data on our relevant metrics, which is especially useful if no existing data is available. After gauging results, we can then decide whether to scale up or not. In software, beta tests could be releasing product features to a subset of total users and gauging their response. In hard goods, beta tests could be rolling out a product to limited number of geographies. In services, beta tests can be trying different methodologies for a subset of clients. The most crucial part of Beta Testing is creating a well designed testing scenario to give accurate results. For example, if only the most affluent users are selected for a Beta Test, the results may not scale in a predicted fashion. Poor experiment set up will create less useful or even potentially unusable data. Beta testing is often iterative and takes more set-up time. Although beta testing is the most relevant comparison, in some cases, beta testing is not viable due to financial, time, or other restraints.
Internal Results: Historic internal data from similar past initiatives. To emphasize, the difference between Beta testing and internal results: Beta testing gathers data from the EXACT initiative at a smaller scale, whereas internal results gathers data from similar past initiatives conducted at full scale. For example if we want to predict the open rate of a future email blast, we can look at open rate data from previous email blasts to the same list-serve. If we have an upcoming promotion and need to estimate sales, we can look at our past promotion data and choose the one(s) that are most closely related to our case. Since internal results comes from historic data, additional tests do not need to be ran, therefore, internal results takes less time and fewer resources to collect.
Targeted Proxies: Competitor benchmarks (if available), targeted case studies, reviews (for service offerings), and other non-internal results data directly related to our decision area. As a former retail strategy consultant, I know that most brands will benchmark competitor products (e.g., check reviews for improvement points, compare specifications) before they launch a new product line. Targeted proxies are good to help triangulate projections coming from beta testing or internal results. However when they are used for projections, they are less reliable compared to internal results. Though unlike internal results, the data is often external, meaning it will take more time to acquire.
General Proxies: External market reports, general expert interview commentary, consumer surveys, press releases, and other high level overviews that are related to the overall goal, but not necessarily our specific decision. For example, if we are deciding if we want to launch a new high performance running shoe, a market size report on high performance running shoes is helpful, but the decision should not be made off of general proxy data alone. Assuming alternatives are available, general proxy data should only be utilized as a supplement on top of the other data sources.

Remember, we want to get data from the closest comparison available. After, we should structure our data so that it can be tied to previous actions. Knowing ‘Pivot Tables’ or ‘Left Join’ is important, but understanding what to pivot or left join on is even more important. For example, data can be sales per initiative, run-time per product feature, returning users per email notification. The choice of comparison features should be as closely tied to the decision and goal as possible. To further our understanding of the sources of relevant data and the structured output format, refer to the table below for specific examples:

We can see in the table, not all data types will be available in all situations either due to lack of data or lack of time. We always want to support our arguments with the more relevant comparison available, but depending on industry, there can be variations in need as well. For example, in software, beta testing will be utilized more since iteration is more easily achievable and tests are more controlled. On the other hand, for hard goods, the company may want to get data from every relevant source available before making a decision since investing in manufacturing is cost intensive and iteration is more difficult. Finally notice that the structured output format should tie directly with the goal / optimization, such that the decision can be answered with the objective in mind.

Situational Adjustments: How can we frame the relevant data to drive influence?

Acquiring the most relevant data often requires more time, but making proper situational adjustments will frame your argument and influence other stakeholders to proceed with your suggestion. We need situational adjustments if there have been changes to the product, users, and/or timing versus these corresponding parameters in our most relevant data source.

Providing situational adjustments will frame our historic data into our current decision and show the resulting impact on our overall goal(s). See below for a further breakdown on the types of situational adjustments:

Product: Deviations in user journey, offerings, item types, features, packaging, flavors, sizes, colors, styles, etc. For example, If there is a change in user journey since the collection of our original data, how do we expect the change in user journey to affect our overall goal? If we deem that it is possible conversion rates may change, what data do we have to justify by how much? If we changed the packaging, will this get us closer or push us further away from our end goal?
Users: Changes in number of users, user segments, marketing resources available. For example, if we are launching a new product to increase revenue and collected our internal data in 2018, when we had 100 daily buyers, but in 2019 we have 500 daily buyers, we need to make an adjustment to our 2018 data, presumably a 5x uplift if linear, since more buyers should experience the new product change. Or if we are targeting a different user segment, is there a different number of available users and how do we expect the conversion rate to change?
Timing: Differences due to seasonality, duration of initiative, day of week, time of day. For example, if the only data we can compare against was a non-holiday season this year, and we are launching a new feature during holiday season, how do we expect the holiday period to affect our numbers? Or if our campaign will last twice as long as previous campaigns, how do we expect our numbers to change?

We need situational adjustments because our historic data never perfectly aligns with our current situation, even if it was a beta test. The quantitative adjustment made for each change needs to be backed with either additional supporting data, ideally from most relevant source available, or sound logic. Note that not all situational adjustments will be needed in all cases and unlike the most relevant comparison, there is no rank between data sources. Therefore, we need to select which situational adjustment(s) will impact our goal the most. If time permits and a higher level of accuracy is needed, more situational adjustments can be made.

Closing Remarks

To drive influence with the most relevant data, we should start by understanding the overall goal. Then, we can determine what decision needs to be made to drive the goal forward. After knowing this decision point and the possible options, we can find the appropriate supporting data from the most relevant sources to support or negate the possible options. Once we acquire the data, we need to properly frame the findings with situational adjustments to influence our stakeholders and promote our decision.

Are there nuances that I missed? If you have any comments or questions, feel free to respond below or connect with me on LinkedIn. Please follow me on medium.com/@dhuynh2979 for more articles about leadership and management.

Utilizing the Most Relevant Data to Drive Influence

Written by David Huynh