Impact of Machine Learning and GDPR on the value of your Data

Robert Sibo
Slalom Data & AI
9 min readMar 26, 2020

--

Everyone knows data and powerful machine learning can be a great asset for a company but we’re still seeing issues when trying to quantify the potential value when appropriately factoring in the risks associated that could seriously impact the true expected value. In 2019 alone we saw numerous instances of data related fines, public relation nightmares and general business debilitating events stemming from risks not fully understood and not mitigated properly. On a positive note, it’s equally important for companies to fully understand this for making decisions around investments, partnerships or internal spend. Going into 2020 executives need to modernize their understanding of data valuations with additional accounting for:

  1. As data has become more democratized internally the analytics and machine learning use cases are increasing, and a company should expect large amounts of value but also increased vulnerabilities
  2. Ecosystems and traditional partner relationships are creating a lot of value and new business models but are also challenging a company’s ability to govern data and forecast potential and risks
  3. Regulations and public outbursts indicate the absolute need to manage data as something that can become toxic or at least massive liability for a company if mitigations such as privacy controls and transparency aren’t planned for

This article will outline what a company should consider when either valuating their company (for a sell or acquisition) or prioritizing investments. Furthermore, despite the traditional perspective on calculating value or ROI, value from data and analytics doesn’t come without a cost and this paper will help articulate what it means to think of data as a double-edged sword where costs and risks can seriously offset any value created.

Part 1 — The siren’s call to collect data

For the longest time technologists worked to convince the business to collect and exploit insights in their data — “Data is an Asset” we’d say. For the last few decades this was the norm and fueled the explosion in data management hardware and software (e.g. Big Data) and wonderful analytics to take advantage of this wealth (e.g. Machine Learning). And as a result companies are indeed finding plenty of opportunities as new services are creating value everywhere using AI and ML solutions (see Slalom’s paper on AI to ROI — link). Business models have evolved, and new markets have been created to leverage the benefits of analytics and rich data from customers and competition.

But now, a year after General Data Protection Regulation (GDPR) and on the cusp of seeing California Consumer Privacy Act (CCPA), we are at a point where data is BOTH an asset and a liability for a company.

Take a minute to think about this, companies can’t generally exist without carrying debt and risk but for the past decades we amassed data risk-free and without regard (largely) to what it could do to the company if misused. We worried only about the opportunity cost of not extracting a competitive edge or additional revenue from it. There was no downside as infrastructure and software costs continued to shrink dramatically and no privacy rules were enforced consistently to limit what was collected. But this is changing…

Part 2 — No free lunch

Let’s explore an updated model for calculating value from data and the various ways value is created from it. One that accounts for the complexities growing as companies explore many new layers of value creation and accounts for the risk and costs of owning or stewarding the data.

The Massachusetts Institute of Technology (MIT) has created an initial asset valuation framework (link) which defines data value as the composite of three sources:

  1. the asset, or stock, value;
  2. the activity value; and
  3. the expected, or future, value.

But let’s double-click here further and make sure that our “no free lunch” approach factors in complexities of downstream usage and also a modern view of discounting and costs. Below illustrate a more complete and balanced understanding of value, accounting for 2nd order factors.

Figure 1: Total Potential Data Value

Focusing on the harder to estimate factors of unrealized compliance and reputational deductions it’s important to re-visit the Total Cost of Ownership (TCO) concept briefly.

Investopedia defines TCO as “The total cost of ownership looks at the cost of owning an asset long-term by assessing both its purchase price and the costs of operation.” Which covers acquisition and technical storage of the data but the comparison to physical assets stops there. For example, amortization and other accounting forms of value decay fail to model the terrifying fines and game-changers that can come from data breaches or public revelations of what “unethical” uses data may be used for. Furthermore, the concept of toxic assets, such as a certain real-estate investment that cannot be sold and pose serious company risk, also haven’t found an equivalent in the data business, yet.

Furthermore, TCO is by definition focused on owned assets but this has changed with the concepts of data stewardship. The blurred line between ownership and stewardship is extended beyond governance discussions with the increased focus and mandates stating that, regardless of ownership (e.g. society, customer, 3rd party), anyone processing or touching data is responsible for its treatment. As of 2018 GDPR and similar regulations are holding companies that own, hold or process the data responsible.

So, a more complete valuation of data should include the following ownership and stewardship risks, or even expected and discounted costs — quantified…

  • Expected cost of data leakage or hacking — Caused when the public learns of data leakage caused by hackers or internal leaks, leading to hefty remediation costs and efforts to rebuild public confidence. Based on the type of Personally Identifiable Information (PII) and other sensitive data a company stores what is the likely outcome after a hack? Is it a game changer from a brand reputation perspective? Fines perspective?
  • Expected cost of the public learning of inappropriate use of their data — Caused when the public learns of how sensitive data is used for analytic inference or other activities that exceed the explicitly granted use cases. Besides being a sticking point for GDPR and CCPA, it’s increasingly demanded that customers approve the way their data is used and with whom it’s shared, in so much as it’s clear what value the 3rd party is providing. Would customers or partners leave if they learned how the company is using their data — how big of a problem would this be? The Turing Institute has an interesting paper titled “A Right to Reasonable Inferences: Re-Thinking Data Protection Law in the Age of Big Data and AI” worth a review as this hot topic evolves in 2020 (link).
  • Expected costs of fines from regulatory bodies for planned non-compliance events — Keep in mind that many companies have accepted a certain level of fines as part of GDPR readiness due to the complexity and cost of fully updating policies, systems and controls. Opting to wait until a critical volume of fines triggers the need to act.
  • Unrealized inherited costs originating from partner network — Caused when partners realize fines or heavy damages due to the above items that become partially applicable due to shared accountability. As Facebook realized with Cambridge Analytica; a company that shares PII is accountable for its partners’ misuse and mismanagement of it.
  • Expected cost of the public learning of a large value disconnect — Caused when the public learns of an unexplainable difference between their received value [services or product] and what the company extracts from the data collected. If the company can justify the difference due to proprietary processing or cost of maintaining the service, that a customer or partner would see as appropriate then the discounted cost of this event would be low. For example, as of 2018 it’s estimated that Google makes roughly $182 dollars[1] per person based on market cap (in 2017, Facebook earned an average of $84.41 from each North American user and $27.26 from each user in Europe based on recurring revenue from ads we click on). These figures of $182 and $84.41 could, arguably, be justified by the value customers receive from the service but what if the company works with genomic data that could be turned into thousands or millions by a BioTech or pharmaceutical through a partnership resulting in tailored medicine breakthroughs?

[1] https://arkenea.com/blog/big-tech-companies-user-worth/ s?

Part 3 — Breaking even in the ‘data business’

Following basic business principles, it is important to fully understand the costs and benefits of data to understand if one is breaking even or even losing money on the activity. The old days of hoarding data just to be in the data business, without ever assessing the value it brings, have ended.

The added difficulty is that, like physical assets, data’s value, cost of ownership changes over time relative to:

  • Internal data governance and privacy maturity (ability to mitigate risk)
  • Internal maturity, quality and accessibility of data (ability to create value)
  • Internal maturity around skills and tools to analyze and report using data (ability to create value)
  • Public transparency, acceptance and understanding of how their data is used (ability to mitigate risk)
  • And many other enabling or dependent factors

In scenario 2 below we see a favorable scenario where the company has realized value from the data. Value realized has approached its potential AND its realized value is greater than the expected cost of the data. A positive return on investment has been achieved.

Figure 2: Cost / Benefit Analysis of Data Exploitation and Risk

On the other-hand, consider a startup that is artificially sustained by venture capital where the cost of ownership of the data is high (actual costs and risk-based expected costs), and they haven’t begun to fully capitalize on the data yet to generate revenue (scenario 1 in figure 2 above). eCommerce companies, pre-digital marketing, were a good example of this which led to outrageous and un-informed valuations…. leading to the dot-com bubble catastrophe. Furthermore, this fictitious startup’s relatively immature governance can lead to poor management of how the data is used, shared and controlled which at some point may result in major fines, public outrage and brand erosion.

While the concept of making a decision to enter the “data business” is largely academic since every company needs to be in the data business, it is worth exploring potential alternative models. These models may apply to different parts of the business or for specific partnerships that the company may enter. Based on the company’s strategic view of data and how it can create competitive differentiation and value pools for their specific business model consider the following data operating models:

Part 4 — Build a Modern Culture of Data

Those who are collecting data and worried about the risks outlined above or that they’re not fully realizing their data’s potential value a big question comes to mind — what can be done and where to start.

In the end, rich operational and customer data can be the difference between hyper growth and being disrupted but it can quickly become a toxic asset without thinking differently about managing it in today’s environment. Take privacy and compliance seriously and work on embedding privacy by design throughout the company via tools, policies, processes and organizational changes. As part of this, a thorough review and assessment of how data is used internally and within its 3rd party and partner network is required and must be actively managed.

Get started, move deliberately and start thinking differently about data and your business model.

Slalom is a modern “digital and cloud native” consulting company with a deep appreciation for all that data and analytics can bring a company. In Silicon Valley, as well as the rest of our international offices, we help our clients instill a modern culture of data and to learn how to respect the role they play as owners and stewards of it.

Interestingly, a modern culture of data is an environment of experimentation, empowerment, curiosity, critical thinking, and collaboration. A company with the appropriate governance, controls and safety nets to protect and fuel this spirit a modern company can actually be set free to explore and do great things. Let’s talk more…

Robert Sibo is a director of data and analytics out of Slalom’s Silicon Valley office. Speak with Robert and other Data & Analytics leaders at Slalom by reaching out directly or learn more at slalom.com.

Rob Sibo

rob.sibo@slalom.com

Miscellaneous reads on impact of analytics, ML, GDPR on business value:

https://podcasts.apple.com/us/podcast/the-cognitionx-podcast/id1477047194?i=1000450949563

https://ssrn.com/abstract=3248829

https://link.medium.com/zU7cHSNoY1

https://sloanreview.mit.edu/article/whats-your-data-worth/

http://analytics-magazine.org/risk-management-total-cost-of-ownership/

https://www.mparticle.com/blog/cdp-roi-tco

https://www.investopedia.com/terms/t/totalcostofownership.asp

https://www.investopedia.com/tech/how-much-can-facebook-potentially-make-selling-your-data/

https://www.investopedia.com/ask/answers/120114/how-does-facebook-fb-make-money.asp

Originally published at https://www.linkedin.com.

--

--