Positioning a Machine Learning Company

The classic guide for entrepreneurs preparing a pitch is Sequoia’s Business Plan Template. This post aims to be a mere addendum to that in the age of machine learning.

Why do investors spend so much time focusing on ‘differentiation’? Because the job of an investor is to allocate money to its best use. Investors shouldn’t allocate money to a company unless it is crystal clear that the company is the best one to solve a particularly valuable problem.

Investors will independently form views on what problems are particularly valuable (‘markets’), find companies solving those problems, then differentiate between them by asking questions like, “Why you? Why now? Why this feature? Why this technology? Why that set of customers first?” to figure out if you’re the best company to solve those problems.

Sequoia’s Doug Leone on the importance of crystal clear thinking to a pitch.

The question of differentiation is also important for entrepreneurs allocating their time to a company. We all want to get out of the rat race, spending our days in an abundant, green pasture free of competitive pressure.

“‘Creative monopoly’ means new products that benefit everybody and sustainable profits for the creator. Competition means no profits for anybody, no meaningful differentiation, and a struggle for survival.” — Peter Thiel

Entrepreneurs pitching investors preemptively answer questions about differentiation in their investor presentations with graphs, matrices and tables.

Standard 2x2
Feature checklist
Many dimensions
Say what you don’t do

Many companies now highlight “Machine Learning” (capitals!) as their source of differentiation. Investors are fairly skeptical of allocating money to those companies, if only because so much money has already gone into the same. So, how do you position your company if you believe that your competitive advantage accrues through machine learning or ‘data network effects’? That is, if you truly believe that your company is collecting unique data and learning over that data to generate a differentiated, compounding ‘data asset’. This article provides a few ways to do that from our perspective as investors focused on this new era of computing.

Efficacy

The job of an investor is to search through a lot of companies to find the best one solving a particularly valuable problem. Thus, investors have adapted to have very short attention spans. Knowing that investors have short attention spans, entrepreneurs often open pitches with fantastical descriptions of their technology. Perhaps a better approach given the general skepticism around machine learning is to give evidence of efficacy upfront. That is, the proof that your product delivers results some order of magnitude better than the alternatives. This is a fundamental question; it’s not worth discussing your company’s ostensibly panacean machine learning technology if your product doesn’t generate enough of a benefit for customers to justify switching costs.

“Customers won’t care about any particular technology unless it solves a particular problem in a superior way. And if you can’t monopolize a unique solution for a small market, you’ll be stuck with vicious competition.” — Peter Thiel

For example, there are lots of companies that purport to predict crop water stress by analyzing the color spectrum in satellite images of a farm. There are indeed some strong secular trends pushing this technology forward such as cheap satellite imagery and fantastic advances in image recognition. However, one of us here at Zetta has a farming background and can say that the prettiest mid-season pictures in the world don’t matter if the plants are dead at the end of the season. A farmer needs to know that what they’re seeing on the screen is predictive of what’s happening in the real world otherwise they just won’t allocate valuable resources (i.e. water) according to the software’s predictions. While it’s very difficult to build physical models of plant growth based on imagery, it’s reasonable to expect some strong evidence that the image-based predictions of water stress correlate with plant tissue samples.

Horizontal v Vertical

Another high level question to answer early in your pitch is whether you’re a horizontal or vertical machine learning startup. For example, Clarifai is a horizontal product and AppDiff is a vertical product. We think that most machine learning-based startups will make vertical products — adding AI to an existing solution — but, if your startup is making a horizontal product, it’s worth articulating how you get sufficient attention and pricing power. Addressing attention is important because it’s hard to maintain the mindshare necessary to retain accounts if you’re a point solution, often API-based, working in the background to make a part of your customers’ products work. API-based companies get attention by, for example, having first-class support when things go wrong or providing strong analytics for developers to regularly check. Addressing pricing power is important because the enterprise-level decision to purchase a horizontal solution, e.g. an ETL API, usually comes after the decision to solve another problem, e.g. churn, and it’s hard to have pricing power if your product is secondary to solving the core problem. API-based companies can get pricing power by, for example, closely partnering with other vendors that take them to customers.

Data Acquisition for Machine Learning

Workflow-Based Data Acquisition

We’ll assume from here that you’re building a vertical application. This probably means that you’re making a (SaaS) workflow product that looks like existing products in your industry but has some interesting machine learning happening in the background.

First, show how your workflow tool is faster, cheaper or otherwise better than the existing solutions to the point of justifying potential customers’ switching costs. This is fairly simple to show by benchmarking against existing products on speed, price or features. An interesting trend with respect to price is giving away a free/cheap workflow (SaaS) tool to entice companies to upload data, then using that data to train their machine learning models. This is similar to a SaaS-driven marketplace strategy. Hopefully the smart data collection and machine learning over that data will eventually allow your product to make extremely intelligent decisions for your customers.

Second, show that you’re making something more than a better workflow — something that analyzes, predicts and prescribes outcomes. What decision-making process can you improve or automate for your customers? What features are you testing in your machine learning model and how are they predictive of some outcome? Data from early experiments or back-testing proving the predictive value of your models is an essential part of a pitch. Beyond that, showing evidence of these predictions generating a return on investment for customers is ideal but not expected at the seed stage. You should, at least, include a ROI model with reasonable assumptions to show that you know how customers think about their purchase of your product in dollars and cents.

Other Types of Data Acquisition

We’ve focused on workflow tools as a way to acquire data because they allow a startup to collect the data specifically needed to train machine learning models and simultaneously get customer lock-in. However, there are many other ways to acquire data to build your model. Whatever your method, it’s important to highlight its unique qualities to investors.

Unique Data

There was a world before machine learning — hard as it is to believe — in which companies were valued on the basis of their data assets. Bloomberg comes to mind, and Onavo is a more recent example. There’s huge value — and we invest in — companies generating unique, defensible and valuable data assets. Showing each of those elements — uniqueness, defensibility and value to customers — by reference to other products can nicely position your startup as one with durability. Something investors will be wary of here is a company based on ‘tricks’. That is, collecting data through a one-time hack is not enough because another company could come along and do the same soon after. Your method must systematically collect data through protectable technologies or networks. Premise is a good example of the latter.

Team

Finally, can you build it? General team quality is always a core differentiator but it’s also important to link the qualities of your team to the specific needs of your company. Machine learning-based companies need people that are good at algorithms, data management, distributed storage and more. Hopefully, your team will be particularly strong in one of these areas. Perhaps you’re developing proprietary machine learning algorithms — a very high bar to meet but very interesting to investors. Perhaps you’re particularly strong at some part of the machine learning pipeline, such as Natural Language Processing. Perhaps you’re using open libraries and APIs for machine learning but are particularly good at data collection and management. Whatever the case, showing where your team is particularly strong is important.

At the end of the day, we just want to invest in companies that can dedicate capital to moving the world forward, losing as little to competitive friction as possible.