Boundaries of agricultural fields, automatically derived from satellite imagery.

Selling data products is the wrong business model for AI startups

Satellites and other earth data sources can help us better understand the planet. It’s encouraging to think that in an effort to track and prevent illegal logging, we’ll be able to know, rather than estimate, exactly how many trees have been cut down in the Amazon. It’s exciting to envision the capacity to accurately pinpoint regions experiencing food shortages, so we can send in humanitarian aid faster and where it’s needed most.

Companies like Descartes Labs are trying to build a business by refining raw data from sensors, and applying AI to turn the data into insights about the world. Our customers benefit because they better understand the environment in which they operate. The question for us startups is: how do we serve those customers and build a business around data?

There seem to be two distinct options emerging: either build a series of data products available to all, or create a platform to build bespoke products, unique to each customer.

There are many companies that build data products. Both Orbital Insight and Ursa Space have global oil monitoring products, derived from geospatial data. The idea is to provide more accurate information to customers in the oil and gas supply chain, who are granted access to daily forecasts of refining activity or storage capacity. Another company, Tellus Labs, recently acquired by Indigo Agriculture, releases daily corn and soy forecasts during the growing season so farmers can plan better and consumers can predict prices.

A data subscription product seems like a really logical way to build out an AI business: write a model once, and sell it many times. The business then looks very much like a traditional SaaS company, making it easy for investors to map onto known businesses.

That’s the structure I thought my company, Descartes Labs, would be adopting when we released our corn production forecast in August of 2015, ahead of the USDA’s forecast. We singlehandedly moved the market 3%, and we thought we were hot shit.

At first, investors kept asking us why we didn’t trade the number ourselves if it was so good. That’s what Cargometrics did. After they built some AI around shipping, they raised a hedge fund to create a startup/trading hybrid.

Though we considered taking a similar tack, we ultimately reasoned that even if we understood corn supply, we knew nothing about demand, currency, macroeconomic fluctuations, politics, or any one of the hundreds of factors that affect the price of corn. Knowing what I know now about the complexity of commodities trading, it was the right decision.

Instead, we decided to sell the corn number to lots of customers as a data product. We figured that everyone in the corn value chain would want to subscribe to our forecast because, whether you’re insuring the crops, buying corn syrup, or trading futures, the supply of corn is a critical piece of knowledge. Our forecast for the US was not only very accurate, it was also published every four days. The USDA only released a monthly forecast. We offered a clear advantage.

After corn, the plan was to cover major crops in the U.S. and then look at other major crops globally: canola in Canada, wheat in the Black Sea region, and soy in South America. Then we’d start building forecasts for other major commodities, and we’d have a new kind of information company. At this phase, we talked about being “USDA for the World” or “an AI Bloomberg Terminal.”

It didn’t work out that easily for us. The problem with trying to create universal data products in a competitive market is that everyone has access to the same information, and everyone responds to that information similarly. Thus there is no edge, no relative advantage, no reason to keep buying the product.

This is the problem with selling data subscriptions: the value of the information erodes the more people know it. If you knew tomorrow’s lottery numbers, you could buy a ticket, make your picks, and win it all. But if everyone knew the numbers, you’d have to share the spoils with a huge pool of players. Our choice at Descartes Labs was to either sell our corn forecast to a single client for a lot of money, or sell to many clients for much less. Since the value of the information decreases with every additional client, there’s a maximum total addressable market.

One argument in favor of selling data subscriptions is that, if you’re able to become the gold standard, then you are… well, golden. After all, everyone needs to subscribe to signals that move the market, regardless of whether you think they’re right or not. Since everyone has the data, customers aren’t willing to pay a lot, but at least everyone is required to know what the information is. The problem for a startup is that there is a huge chasm between being an early disrupter and a market standard. Why should the market trust you?

Another pernicious detail is the fact that all information tends toward being known by the market, a process that’s sped up by the internet. That means the advantage of getting data ahead of a competitor is temporary. No longer does anyone have to wait for a copy of the Wall Street Journal to see how Proctor and Gamble is doing on the NYSE; we can all see the price in real time, just like a trader.

For all these reasons, Descartes Labs stopped pursuing a role as an AI Bloomberg Terminal and focused instead on building out our data refinery. We realized that the same data, modeling tools, and cloud-based supercomputer that we built to create the corn model could be used to build bespoke models for customers. We think of our data refinery as a piece of enterprise software, which will require being customized with our customers’s proprietary knowledge and data. You don’t install Oracle Finance out of the box, changing your organization to match Oracle’s software; you customize Oracle Finance so that it supports and digitizes your internal processes. The same thing is true for creating insights on data: combining external data with a company’s proprietary data and knowledge will enable a far better model.

Of course, I’m glad there are data products companies. The Orbital Insights and Genscapes are an important part of sharing knowledge with the general market. But I also believe that their upside is limited.

There’s value for everyone having access to the same data, but smart companies are going to look for a sustainable data advantage. Insight should not be outsourced; it should be a core function of every company. That’s why every company will need a data refinery.