Historically, technology has destroyed some jobs (how many of your friends are candlestick makers?) while creating new ones (how many of your grandfather’s friends were data scientists?). This is known as creative destruction. The question haunting the advance of today’s technology is whether it will simply destroy jobs without leaving enough replacements behind.
In the summer of 2018, HFS Research produced a study called “How to Avoid Your Looming Machine Learning Crisis” based on responses from “153 data science decision-makers across the Global 2000.” The report recorded a range of responses, reflected in the graphic above, to the question of what to do with displaced workers. In response, the authors wrote that:
Struck with uncertainty in how workforce requirements will change, many enterprises are confused around this challenge of what to do with displaced employees in the near and long- term future. A confounding two-thirds of respondents think that their workforce will be able to be retrained to perform data and ML related tasks. This is a highly unrealistic outcome, especially considering the . . . already severe talent gaps in ML.
. . . it is unlikely that the majority of displaced paralegals and legal assistants performing legal research and case development would go on to obtain CS, ML, MDM, or data governance and stewardship degrees and certifications. As enterprise leaders, you must rethink and redesign roles for ML-enabled operations, to augment human decision-making and input where needed.
So what can displaced workers do?
Humans Can Guide the Machines
While I agree that it’s unlikely for data entry professionals to suddenly gain advanced machine learning skills or become Ph.D.-level data scientists, data familiarity may make these workers valuable as “machine learning coaches.” In other words, they can help teach the learning machines by ensuring that the right data is going in, and they can also verify that an ML system is not drifting off course by checking the results that come out. Human feedback can then further improve the models.
Those who are already intimately familiar with a given data set — including its inconsistencies and idiosyncrasies — may be the ones best suited to perform this teaching/coaching work. Thus, a good number of workers actually can be trained for data and ML tasks, as long as companies have reasonable expectations of the new roles to be filled. (And to be fair to those HFS respondents, it’s possible that some of them had roles akin to “machine learning coach” in mind when they said they would retrain people to do ML/data tasks.)
After all, data readiness is the difference between machine learning failure and success. Fortune 500 companies spent a lot of money installing tools like SAP, Oracle, Microsoft Dynamics, etc. Fortune 500 companies may ultimately spend as much, if not more, money getting their data out of their systems and ready for ML.
The best algorithms are worthless without good data. In many cases, preparing and cleaning the data for machine learning processes will be a massive organizational need. Is your data accessible? Sizable? Usable? Understandable? Maintainable? Who better to help get your data in shape than the people who were previously processing it manually?
Data cleanup deals with anomalies and problems an algorithm might not understand. It’s a role in which people can thoughtfully apply their experience in a way that doesn’t make them feel robotic.
Imagine a person processing invoices for payment. If their standard data entry work is automated, the person is freed to evaluate the data on a higher level. Looking at whether an invoice is coming from and going to the right place, whether appropriate discounts have been applied, and whether the invoice just makes intuitive sense is a better use of their time than simply copying and pasting numbers from one screen to another.
The history of “human computers” can help shed light on the present. In her article When Computers Were Women, Jennifer S. Light writes that in the World War II era, six human computers “were selected to program a machine that, ironically, would take their name and replace them. . . ”
This is an example of how automation requires human guidance from the very people it’s automating. But it’s an ominous example, both given the word “replace” and because the six trainers came from a group of almost 200 women.
Sure, someone might think, some people have a role in the transition to automation. But the transition only requires a fraction of the displaced workers, and even those people are soon rendered obsolete. The situation may seem analogous to outsourced workers training their foreign replacements.
Fortunately, given machine learning’s massive data labeling requirements, there really is a lot of human work to go around.
Humans Can Label the Data
The company Scale labels images for companies like Lyft and Toyota, helping to train the AI behind self-driving cars. CEO Alex Wang says tens of thousands of people work on the platform. “If you look globally at the whole industry of self-driving,” he says, “there’s probably hundreds of thousands of people who are employed to help these algorithms learn.”
Wang doesn’t believe that the need for humans today is merely a way station on the road to human obsolescence. He points out that mature machine learning efforts from Google and Facebook “still require a significant amount of human input on an ongoing basis to ensure that the models are performing better and better.” The deployment of self-driving cars, he says, will require “tens of thousands or hundreds of thousands of people to ensure that these algorithms are behaving correctly of that they’re continuing to get better.”
Wang says machine learning is just getting started. “The true number of applications is very large and then the true amount of work that will be required in each of those applications is really underestimated.”
There is another potential concern around the new jobs that machine learning creates: when it comes to data labeling, isn’t that just another kind of tedium? Not, says Wang, if the work can be turned into a game. Words With Friends and Candy Crush may seem dull from a distance, but “people play them for hours and hours. And a lot of that is because they’ve been built to be engaging. It’s not necessarily anything complex about it that makes it engaging, but it’s about making it a fun, enjoyable experience where you can always get better and there are always things to improve and there’s always areas for you to be creative . . . . We try to do the same thing with data labeling.”
In other words, prepping data for machine learning may have a hint of Mary Poppins magic. You find the fun and snap: the job’s a game.
Robbie Allen is a Senior Advisor to Infinia ML, a team of data scientists, engineers, and business experts putting machine learning to work.