Part 2 — The Human-in-the-Loop Business Playbook

Raphael Danilo
11 min readAug 31, 2020

In Part 1, I covered a core principle that drives an enterprise AI/ML app to prevail: a thoughtful human/machine hybrid approach to a business task that lets you better capture data — improving supply quality and efficiency — as demand increases.

In Part 2, I share the unique tactics and moats in human-in-the-loop businesses that you can leverage to make the market “tip” in your business’ direction. I gathered these for you by spending the last 3+ years around operators at human-in-the-loop businesses, investing in a few of them, and building one from scratch at Yobs.

By design, human-in-the-loop businesses will continue to gain faster adoption and build greater moats than “black box” enterprise AI/ML businesses in most industries. In some industries, the latter type won’t survive period. At the other extreme, every traditional SaaS business will need to layer on AI/ML components to their products to win and retain customers. But most aren’t set up to make the transition. I’m biased, but this makes human-in-the-loop businesses possibly the most exciting to be building right now.

Don’t get me wrong, it also makes them harder to build than traditional SaaS businesses. Louis Coppey’s awesome AI-first SaaS investment napkin points out that AI/ML SaaS startups face these 4 additional challenges to win:

  1. Identify problems and experiences well suited to a data-driven solution.
  2. Collect and label data.
  3. Extract value from the data by building models with or without feature engineering.
  4. Find early customers to validate and contribute to the (future) value of their learning system.

You’ll notice that points 2 & 3 are highlighted. That’s because they are the silent killers of mass adoption for promising enterprise AI/ML startups. Not just data, but labeled data, is more important than your algorithms. In fact, labeled data is maybe the single greatest bottleneck to core customer happiness for your AI/ML app. This is what Part 2 is all about. It’s about getting to customer happiness so superior to any substitute that the market “tips” in your direction.

To do this, you need to find a scalable and systematic way to grow that lets you improve how you capture the datasets underlying your core business problem. This means identifying and maximizing “tipping loops” — loops that create momentum for you systematically. (S/o to Sara Tavel.)

There are two types of tipping loops for AI/ML businesses: data capture loops, and happiness loops. Data capture loops help you build a moat of customer data and puts up barriers to entry for new competitors. Happiness loops help you consistently turn this moat into superior supply quality for your business. I’ve identified 3 data capture loops to “tip” your market, and each can unlock a happiness loop if activated correctly.

To illustrate these, let’s look at two of the most effective startups in existence at this human-in-the-loop exercise: Uber and Scale AI.

First, what are Uber and Scale AI in 2 sentences?

  • Your grandma knows what Uber is, so at this point, you should too :-)
  • Scale offers an API for data scientists to get their image, text, and 3D data labeled quickly and with high accuracy at an affordable price. In 3 years, they achieved a $1B+ valuation by convincing clients like Airbnb, Lyft, SAP, and even Elon Musk’s OpenAI that they could label their data 10x better/faster than they could on their own.

Cool, but how can I win like them? They really mastered the 3 data capture loops and 3 happiness loops to tip their market. You can too.

Source: Graphing the human/machine hybrid learning process.

Six ways to tip your market (3+3):

Loop #1 = Capture a specific market/data context. And do it well.

To grow rapidly in a systematic way as a human-in-the-loop business, you must capture a dataset in a market that was previously fragmented or difficult for incumbents to build generalized learning models from. By becoming the de-facto destination for customer data in a given business context, and controlling the supply, you can charge a premium for your service and grow the overall size of the market you operate in. There are many ways to hack this dataset, and building a traditional SaaS business first can be one of them, but none of these hacks is a silver bullet. Ultimately, you must solve for the complexity of a task that requires the customer’s data in this context and remove friction from the data onboarding process.

  • Uber: Uber started as a black car service that hired mostly limo drivers looking to make extra income. It wasn’t until their Series B that they started offering Uber X and Uber pool, and years later added new modes of transportation like bikes and scooters. Fundamentally, they abstracted away the complexity of transportation from point A to point B with the push of a button. In the process, they were able to capture the largest dataset around how consumers get from point A to point B without the consumer feeling like they were handing over any data to Uber. This can work for consumer businesses, but it’s much harder when you’re dealing with a Fortune 500 company that hires armies of analysts and lawyers solely to protect their data. And who actually read the user agreement.
  • Scale: Since they target data science teams, being API-first removes friction from onboarding customer data. Coupled with their SDK’s, Scale AI makes it dead simple for their customers to embed them in the heart of their IT stack. Scale is the de-facto destination for data science teams to label their image, text and 3D data. This, in turn, creates higher switching costs for their customers and helped them build bigger datasets than those of their individual customers. Given they sell to businesses, it can’t be as invisible as the Uber way, but the underlying concept is the same.

Unlocking the related happiness loop (#1):

By collecting data in a similar context across a set of customers in your market, you will naturally build a unique advantage. This requires you to stay focused in the early days on just one problem, or a small number of problems where you can re-use your existing supply base. It may be tempting to move to the next vertical as you start to get traction in your original vertical. For a human-in-the-loop business, that is almost always a terrible idea. Instead, this is the precise moment where you should double down on your core market by controlling the supply and codifying your human/machine learning processes. But when you do tackle this adjacent market segment (e.g. going from black cars to mass-market sedans), you will be able to recycle your superior supply infrastructure (e.g. distribution from your popular app, back-office logistics, and driver supply) and quickly win in this new market. This focused approach is what will allow you to beat even the most sophisticated deep learning AI apps built by smart(er) PhD’s who want a piece of your market. That’s because not just data, but labeled data, is more important than your algorithms.

Loop #2 = Build a highly engaged and loyal supply base

In any human-in-the-loop business, the given task is completed by a human/machine hybrid. The machine can work in parallel with the worker on the same tasks, or the workers can delegate redundant tasks to the machine. However, before you can reap the benefits of this hybrid approach, you must concentrate and create loyalty with your worker supply.

First, this lets you control the quality of the end-product that your customer will interact with and judge you on. Second, controlling the supply gives you the opportunity to watch and learn from the best and most creative workers. This will be key as you start to codify those learnings and use them coach the rest of your workers and/or ML models. The basic principle to a highly loyal supply-base is to create greater value for your workers than they could otherwise capture by themselves with the same level of effort.

  • Scale offers meaningful, sometimes even life-changing employment, to their labelers who are primarily based in developing markets. Coupled with a gamified labeling experience, Scale is able to achieve high fidelity and high engagement on the supply side. Naturally, hiring and managing their own human labelers puts up barriers to entry for Scale’s competition. With every task from a new customer, Scale gets smarter at performing the task. More importantly, they get smarter at quantifying the performance of each labeler on a given task.
  • Uber also hires its own drivers but acts as the main interface with the end customer. They create value for their drivers by providing them with both the clients, the back-office logistics to match them with the best rides, and the flexibility of working when they want. As a result, Uber was able to “steal” away thousands of limo and taxi drivers from the incumbents who did not engage and create as much loyalty with their workers. This happened despite the absurdly high switching costs caused by the medallions that Taxi drivers faced.

Unlocking the related happiness loop (#2):

People are fundamentally error-prone, and machines (today) fundamentally lack intuition and common sense. Hence, it’s often hard to consistently create happiness for customers by relying on seemingly unreliable sources of supply. That’s why many black-box AI/ML solutions get labeled as “Snakeoil AI.” It’s actually fine for some predictions to not always be correct. But if you can’t explain why or how the prediction was made, then you permanently lose the customer’s trust.

For that reason, you must build a data-driven methodology for measuring supply quality and explainability at the most granular levels. Without this visibility, your ML models will learn from poor or noisy data, your best workers will leave, and your supply chain will fall apart. To get this visibility, you can use tools like confidence scores for each worker, inter-rater reliability metrics, statistical and machine learning checks and much more.

By using these tools, Scale and Uber mastered the art and science of optimizing their confidence in the quality of any given task while relying on fundamentally error-prone humans.

Loop #3 = Codify your learnings with the human/machine hybrid

This will come as no surprise, but you want the $Y amount you charge to customers for a service to be higher than the $X amount it costs you to make it. At first glance, this would make “black box” AI businesses look like much more profitable ventures than a human-in-the-loop alternative, right? You wouldn’t need to spend all this money hiring workers and doing all this operational work. However, this assumes the market will want the “black box” AI approach, or at least will be okay with it. But for most enterprises, in most contexts, they simply won’t be. I’ve had conversations with decision-makers at 100+ Fortune 500 companies, and the lack of transparency and explainability in the black box AI approach tends to trigger two emotions in them: anger and fear. And let me tell you, those don’t sell very well if it’s your “solution” making the customer feel that way.

With the human/machine hybrid approach, you are by default choosing the path of faster market adoption, more thoughtful usage, and initially lower margins that increase over time. In practice, as your human-in-the-loop business grows, you can use increasingly more sophisticated data analytics to learn from what each worker is uniquely good at and create a unified knowledge base. Distributing this knowledge back to your workers, and/or machine learning models will then be the key to turn these learnings into superior supply quality.

  • Uber: Their most basic metric to keep track of supply quality is the now-famous 5-star driver rating that gets updated with each completed ride. This is a simple, but incredibly powerful way to “label” supply quality because its marginal cost to Uber is a whopping $0. The data is crowdsourced by the customers, so it’s obviously imperfect. But generally speaking, and with the addition of more sophisticated tools, Uber can easily identify who their best drivers are (e.g. smooth on the break pedal) and what the ideal trips look like (e.g. short wait times) to learn from them. In turn, they’ve built sophisticated “Marketplace Forecasting” tooling that sits on top of this dataset and improves the supply quality accordingly. These tools include optimal drivers positioning, surge pricing calculation, dynamic cities clustering and more.
  • Scale: An example at Scale is the creation of a confidence score for each labeler based on their work history. If you perform well every time your work gets checked, Scale’s confidence score in you, the labeler, increases. Incidentally, Scale can spend fewer resources on checking your work, and more on the other labelers in which they have lower confidence scores. This is more expensive than Uber’s method, but it is also more reliable and consistent. It is critical, and worth the effort, when the stakes are high. And they often are in enterprise SaaS. For example, when your customers pay you to label their data with 99% accuracy.

Unlocking the related happiness loop (#3):

As mentioned, once you are capturing what past and current workers are uniquely good at, you can build a separate set of technology tooling that supports the coaching of all current and future workers. You can do this by injecting these learnings into your processes and even into each worker’s onboarding or daily workflow, in a personalized way. Imagine the average driver, photographer, teacher, marketer etc being able to learn from the best in their field thanks to the concentrated supply base you’ve built. Like a personalized “Masterclass” for workers in any given field. This unlocks massive growth for your market and benefits the customer massively.

But this knowledge isn’t just useful for your workers. It can be useful for your automation efforts, too. If certain tasks used to require intuition, but can now be broken down into specific rules and steps, then they are made redundant. If they are redundant, more often than not a piece of software can do it better/faster/cheaper. In turn, workers can delegate those redundant tasks away to the machine and focus on more creative and high-impact tasks.

Source: How Scale AI assists their labelers with ML.

With that in mind, it’s really no surprise that Uber is investing billions in autonomous driving technology. First, paying drivers and maintaining processes to ensure they are reliable is incredibly expensive for Uber. If they can introduce Level 1 vehicle autonomy, let alone level 5, the driver’s performance would become a lot less volatile. Plus, their margins and supply quality would naturally go up. Second, they have one of the biggest datasets of how people drive alongside Lyft and Tesla. Hence, they are also uniquely positioned to build the best human/machine hybrid service for transportation.

Wrapping up

This human/machine hybrid learning process can repeat itself infinitely. Every time it does, it will increase the overall quality of your supply, the happiness of your customers, and your unique bargaining power in charging a premium for your service. The 3 questions for you become:

  1. What is the market/data context you are uniquely good at capturing?
  2. How are you systematically building and learning from your superior supply base?
  3. What will level 1 autonomy, and even level 5, look like for your business?

Liked it? Follow me on Twitter here, and add me on Linkedin for more or just to chat.

PS: Big thanks again to the operator/engineer/investor friends from human-in-the-loop businesses like Ironclad, Uber, Lyft, Bird, Scale AI, Guru, and many others who helped question and refine my thinking on the topic over the last few years.