Why Machines Need Humans

Published in

K Means What?

4 min readMay 4, 2018

If you’re following the KMeansWhat series, you already know AI solutions aren’t as magical as marketing makes them out to be. While algorithms and machine learning models can be built on data alone, typical enterprise applications built using supervised learning require “labeled” data. You can think of the label as the “answer” that the model should learn to make future predictions. In some cases, humans need to either provide or confirm the label. The general term for humans assisting machine learning is “human-in-the-loop training”.

Human in the Loop

Human-in-the-loop can come in many forms. You’ve probably used services you thought were provided by machine intelligence that might have been human-backed. Think about the first business card scanner that created a contact with all the right information. Or a receipt scanner tied to your expense app. Or a photo app that recognizes your family and friend. These required people to perform the work at first. Even once machines learned reliable character recognition from an image they didn’t understand how a business card or receipt was structure to parse and extract various fields. As the people behind the curtain did the work to convert a card to a contact they were also teaching machines. Over time, the model improves and replaces or reduces the need for human intervention.

The same scenario applies to many machine intelligent solutions. Chatbots don’t know what you’re asking for until they have conversations or scripts to learn from. Self-driving cars don’t understand what yellow and white lines represented until we give them plenty of examples of how to respond to them.

Labeling Data

Sometimes labels are provided as part of a process transaction. A service desk agent assigns an incident to a category or resolution team. Other times you might add labels after the fact without actually knowing you’re training a model. Like when you tag a friends face in a photo to help you find them later. In more advanced cases you might need someone to review and label (aka annotate) data for the specific purpose of training a model.

The annotation of data may be as simple as a single label or classification. A more complex example is identifying parts of speech, named entities, or sentiment within unstructured text. Computer visions examples would be locating and tagging object in an image or video.

For cases that need significant human labor, crowd-sourced platforms like Amazon Mechanical Turk or Figure Eight (formerly Cloudflower), or dedicated annotation services like Appen may be used.

Active Learning

The previous examples are based on humans providing information before the model uses the data to train or improve. Another example where humans can help out is called “active learning”. This isn’t as generic as is sounds, most models will continue to actively learn and improve using recent data. That’s not what we’re talking about here.

Active learning is usually used to help improve a model by assisting with lower confidence predictions. Model predictions are based on some measure of confidence so machines can ask for assistance in the cases where its confidence is low, or below a given threshold. In these cases the human doesn’t just provide the label as in supervised learning, it provides feedback on the prediction from the model. The machine learns what predictions it is making correct or incorrect.

Active learning models are usually optimized to understand what data it can best learn from and when it needs help. The result is better performance with less data. Since labeling complex data can be time consuming and expensive, anything that expedites this process is valuable.

If you’d like to see some examples of an active learning, check out the live demo at Prodigy. Here’s a shot from a computer vision object detection model. The human doesn’t have to worry about describing what the image is. They only need to confirm or reject the predicted label. The user can also see progress from their involvement.

Potential Side Benefits of Active Learning

There’s some strange feeling of accomplishment and contribution from this type of work. I actually feel like I’m helping a machine get smarter. This isn’t always the case when you’re tasked with filling out a spreadsheet or form with annotation labels.

The work can also be delegated out in smaller chunks. Say you’re interacting with a virtual assistant chatbot and it provides what you need. In return for its service it may ask you a simple question and in a matter of seconds, you helped make it smarter.

For gamification fans out there, there’s also an opportunity to gamify this process. Points for whoever is teaching the machine more?

There are still plenty of hesitations and lack of trust in machine decisions within the enterprise. I see the act of validating machine predictions potentially improving the confidence a business user has in a system.

If you’re already using active learning in your environment, share your experiences.

Why Machines Need Humans

Human in the Loop

Labeling Data

Active Learning

Potential Side Benefits of Active Learning

Written by John Roberts