“Data is like a Moat!” and Other Bad Ways to Talk About Data and AI

T. Dalton Combs
Boundless Mind
Published in
5 min readSep 15, 2017

If you‘re founding, funding, or fueling a startup, you need to know how to talk about data.

Almost 50 years in to the Information Age and we still lack great ways to talk about data and its value. Newly arrived 3rd wave AI is hungry for data in a way that 1st and 2nd wave AI never were. A fresh influx of capital is fueling work and attention around AI again. But in some ways, these new companies and projects make it ever harder to talk about data.

Our language about data is sprinting away from us. We look back on phrases like “horseless carriage” or “talking telegraph” and wonder how people could have used such backwards language when a future tech arrived.

But we’re doing the same thing today in how we talk about data. Here’s how I’ve heard a lot of investors, creators, and other founders talk about data . By understanding the history and limits of these analogies, you can be one step ahead of everyone in understanding the impact of data and AI on business.

There are 4 main analogies that people are using to communicate the valuable role that data plays in business; especially in AI-centric projects:

  • “Data network effects”
  • “Data moats”
  • “Systems of Intelligence”
  • “AI Flywheels” ← my favorite!

Here’s how I understand each of these analogies with my team at Boundless Mind.

Data Network Effect

A Data Network Effect is a property of a product that improves with the more data it has available, due to emergent relationships between segments of the data. It’s an analogy to Metcalfe’s Law as it played out in telephone networks and the later “Social Network Effects” that explain the defensibility and the growth of businesses like Facebook. Like a social or telephone network, only some small fraction of the data in a data network is useful to any specific customer or task (there are — at most — only 500 people I care about on Facebook of my thousand-odd connections). In a product experiencing a data network effect, as the total amount of data available to a learning system grows, you can provide more value to any individual user even though they only interact personally with a different small sliver of the data ‘network’. Another nice thing about this analogy is that, like in a communications network, every member of the network generates value for the network simply by participating in it.

So if you’re designing an AI-powered product, how might you store, collect, analyze, and act upon your data such that your models and product appreciate with more data?

Data Moat

A Data Moat describes a competitive advantage a business holds because of its proprietary data set. It’s a modern extension of traditional “moats” of business, such as vendor lock-in, branding, trade secrets, efficiencies of scale, and regulatory capture. Like their literal counterpart, the metaphorical data moat protects an earned market position from potential followers. Warren Buffett described the importance of moats in his investment theses: “In business I look for economic castles protected by unbreachable ‘moats’.” Like these other defensibility moats, a strong data moat inhibits competitors from stealing marketshare.

To build a strong data moat, your data set must be, amongst other things, large, unable to be synthesized or statistically replicated from first principles, and difficult to acquire due to access barriers. (For example, Boundless Mind’s Persuasion AI benefits from our many relationships with many different customers. A CV-based Cancer-Detection AI benefits from exclusivity and patient data privacy laws. An ultrasonic oilfield surveying AI benefits from land rights and cost barriers. A fraud detecting AI benefits from a regulatory environment that keeps transactions private.)

As a counterexample, a CV-based scene parsing system, which can train using freely and widely available natural image data, might struggle to build a data moat. Since there’s no protection to that data source, anyone can train a model on it. That means no one model could be expected to drastically outperform another based on data alone.

So if you’re designing an AI-powered product, what sort of proprietary data set might you incorporate into your core value proposition as to create a defensible data moat?

System of Intelligence (SoI)

SoI is an analogy that Jerry Chen at Greylock Partners makes to the traditional Systems of Record (SoR) that businesses use, brought into a more contemporary age. SoRs capture value by placing themselves between the customer and their data. For example, they’re the SaaS that a business might use for managing data about its customers, employees, IT infrastructure, and finances. It’s the Salesforces, Workdays, Oracles, and Atlasssians. If a customer can’t access their data without continuing to pay for your software/service, you’re in a very strong position.

Analogously, Systems of Intelligence capture value by putting themselves between their customer and the value generated by their data.

This analogy is a good description of a business model, and that makes it useful for telling data driven businesses apart from a data driven toys.

If you’re building an AI-powered tool and your customer doesn’t know what to do with their data without your System of Intelligence (that has also been trained on lots of other people’s data — see Data Network Effect), that’s a wicked strong position to be in.

Data/AI Flywheel

A Data/AI Flywheel is an analogy we like from Bradford Cross at DCVC. It’s akin to Jim Collins’ business value flywheel. Collins’ analogy is one is my favorites because it explains how businesses grow and become defensible. It explains how, in a well-aligned business, value generated in one part of the business fuels the growth of value in another part of the business.

Cross’s AI-specific analogy of the Data Flywheel explores how the role that humans have previously played in creating value in business processes is being replaced by AIs.

This analogy also provides a way to think about even very young AI-powered companies. You can evaluate them based on the existence or strength of each linkage in the Flywheel: does this better solution to a customer need expand data collection? Does the increase in data from a new customer improve the core model? Does a better model deliver better value?

If you’re designing an AI-powered product, how might your data and business processes cooperate to create an AI-Flywheel?

I wish I could tell you that there’s a right way to talk about data, but there isn’t. We use analogies to talk about things we don’t understand, and we don’t understand how data and AI will impact business yet.

Eventually, we’ll get to stop making these analogies. we will have a organic diction (lexicon) of data . . . but by then, the next generation of great tech companies will be established.

For people living and working at the leading edge, bad analogies are just like water.

In the information age, data’s the new oil.
- AWS Marking Material



T. Dalton Combs
Boundless Mind

coFounder and CPO at Jasmine Energy. NeuroEconomist by training.