
Machine Learning for Executives
What every Executive should know about machine learning projects before knowing anything about machine learning
Advances in machine learning are defining artificial intelligence breakthroughs. Machine learning is the technology that allows computers to adjust and expand their knowledge without being explicitly programmed.
With significant breakthroughs in all variants of machine learning, it’s increasingly clear that best practice application of machine learning to specific enterprise problems creates value. Machine learning and the new human / machine interfaces, such as assistants that understand natural language and personal context, have the ability to augment human performance. Similar to how a group of cyclists working to draft together can pull further and further ahead of the stragglers, the new competitive advantage will be skillful adoption of machine learning and AI.
While machine learning, and deep learning in particular, is a hot topic, as a C-level executive it’s vital to recognize that effective leverage of machine learning is mostly still about people rather than technology. The key benefit of machine learning is speed and scale. The key risk is that no one, not even the experts, completely understands how it works. Most complex machine learning models are, to some extent, black boxes.
I am working on a machine learning primer for business executives but, before getting too deep into the weeds, here are three points for C-suite executives on what to know about machine learning projects before knowing about machine learning.
#1 Good Data > Big Data, Still
With the progress made in deep learning (a branch of machine learning that is essentially deep neural networks + computing power + enormous data), there has been a revival in conversation over big data. This has often been to the point of implying that data quality is no longer of concern, that as long as it’s big, it’s good. While these algorithms certainly perform better on larger data sets, a feature that’s specific to deep learning as it stands today, it still has to be good data. Deep learning is not a data cleaning process.
Somewhat clouding this message is a rise to consciousness of new terms such as dark data, unstructured data and unsupervised learning.
Unstructured data does not equal ugly data dressed up with a bow tie. Unstructured data is simply data without labels, is not organized or does not have a pre-defined data model to describe it. Unstructured data is often text; text in an organization’s emails, text in social media comments and text in documents such as contracts. It can also contain data such as dates and numbers. Getting to grips with unstructured data is now something that machines are doing very well at. Whether it’s automated report preparation and interpretation in natural language or highly targeted predictive marketing, there are both traditional vendors and specialized start-ups with truly novel products and applications.
Dark data is operational data that’s not being used. Estimates are that dark data could be 80–90% of data collected in companies. A lot of dark data is unstructured. A lot of research is going into cognitive computing with the expectation that machines can automate dark data discovery and make sense of it at the same time. Like any challenge of this nature, it’s probably possible but at what cost versus benefit? Because the data is dark, the business case is murky, made especially opaque if technology risk is taken at the same time. It pays to be suspicious of the magic wand of technology being applied to the black box of dark data. Hoarding dark data solely in the hope of it being useful in future is the corporate equivalent of cryogenic whole body suspension.
Unsupervised learning is the process of creating new knowledge from unlabeled data. It’s the state of the art in BI for discovering hidden patterns or associations in data. Back in 2008, Big Data became a catchall for abandoning causation in favor of correlation, the scientific method effectively made obsolete. But with machine learning algorithms able to handle extraordinary scales of data, the bigger the data, the noisier it can be and the more the algorithms can find associations that do not exist. In supervised learning some of this can be caught at the training phase and is called “overfitting.” With unsupervised learning it’s more difficult to detect and test. Some are simply spurious correlations. Maintaining a healthy skepticism and continuing to seek clarity of causality (the fewer possible explanations there are for a correlation, the more likely the events are linked) remains the gold standard. Correlation is good enough only when there is frequent correlation coupled with a clear causal hypothesis, and the benefits of acting on the prediction outweigh the risks. Having a scientific understanding of “why” might not be fashionable but it’s still superior to blindly acting on simply “what.”
Many companies have struggled to meet the ROI and other, intangible expectations for their Big Data projects. Whether it’s a shortage of data scientists, the real-life difficulties of governing data lakes or whether it’s the complexities of managing multiple vendors, advisors and service providers, data management is hard. Some pundits present machine learning as a panacea for missed Big Data expectations, as though AI is the life vest being thrown from the raft of professional services to the CIO drowning in her data lake. But there’s an important, and logical, reason why machine learning and big data are not in a perfect long-term relationship.
There’s an extra subtlety with machine learning and Big Data — non-stationary data and adaptive models. Once models are built they tend to only work well on the data (the world) they were trained on. If the world changes, they can stop working and yield inaccurate results. If valid Big Data is eroded by a changing world, essentially it’s just small data again. And it’s very hard to know whether this is just the ups and downs of the normal world or if the world has changed. In fact, there are many researchers and startups actively working on this (called sparse data) because the economics of big, non-stationary data in machine learning will limit progress in some important fields.
The question, as posed by Matt Turk in “is Big Data still a thing?” is what will AI do to Big Data work? The near-term, most promising relationship between Big Data and AI is to help the scarcest resource — the data scientist — be far, far more productive with data than they have been so far. The world doesn’t have enough data scientists to handle all the data science problems and data scientists don’t want to spend their time explaining data science. The hope is that’s what the AI will do.
#2 It’s More Science Than Software
Facebook’s machine learning research group talks about “experiments.” By mid-2016, Joaquin Candela, Director of Applied Machine Learning at Facebook stated that 25% of Facebook’s engineers conduct machine-learning experiments. This is very deliberate language — machine learning is trial and error, domain specific feature engineering and a lot of experimental design. It doesn’t follow standard software development processes and it doesn’t yield a comfortable answer in the Boolean logic that is reflective of its creator, the software engineer. Machine learning projects require an even tighter problem definition than software projects because the effect of making the prediction and contributing to the environment alters the problem itself. It’s game theory on a grand scale.
Peter Norvig, Google’s AI research director, has described the gaps that exist between the old software models with programmers and traceable logic and the new coding of machine intelligence and statistics. There is no version control, there are no specific releases, and there is no testing. As machine learning diffuses across systems, companies must figure out what it means to de-bug something that is not modular, constantly changing and may embed biases based on the nature of the statistical algorithms and training data. I’ve written about this previously — bias is a bug and until we know how to debug, executives need to be extra vigilant. New policy should explicitly oversee and monitor machine-learning experiments against company values.
Machine learning is not tidy, staged or certain. It’s a process of prototyping, evaluating, testing and scaling. Just as pharmaceutical companies can’t afford to spend all their resources at the pre-clinical trial stage, neither can companies spend all their machine learning budget on the first iteration. Many experiments are required and maintenance remains an unknown quantity. Many problems need to be reframed in less-than-absolute terms, say by setting acceptability rates for accuracy or false positive / negative results. This world of uncertainty isn’t a natural fit for companies that think in absolutes.
It pays to get comfortable thinking in probabilities.
#3 It’s More Wisdom than Hack-Dom
There’s a reason many of the celebrities in AI have their share of grey hair. Machine learning artisanry takes years and usually a PhD, or two. Over the last twenty to thirty years, as the foundations for today’s machine learning have been laid in applied mathematics and statistical AI, the people who wouldn’t give up had time to try numerous approaches. Today, as a new generation of data scientists and engineers craft modern AI, the earlier experience plays an important and active role. There is a long history in the fundamentals that modern computer science development relies heavily upon. The big players in AI and machine learning all have research groups because R&D is an important investment strategy — Facebook, Google, Baidu, Microsoft, IBM, Uber, Toyota.
Facebook was forced into researching their own natural language translation system because the language of Facebook is quite different — full of colloquialisms, abbreviations and localizations. R&D was the only solution. OpenAI’s mission is to build safe AI. They have recruited leading lights for the cause and want to see AI be available to everyone. Right now, many AI academics are setting up their own start-ups. It’s a way to get recruited and a way to see research applied.
Again, the commercialization process is fundamentally different from previous information technology development. Modern IT tools have lent themselves to a hack mentality. For the last half-decade VCs have been among many to call out how to build a company in the world of apps. After the boom and bust of cleantech and early internet, it was a breath of fresh air to be able to seed a college kid with $10,000 and assume they’ll return with a customer base and the makings of a unicorn. In companies, hack-a-thons and skunk work developments seem almost to mock the IT architecture. Some of this sentiment exists in machine learning. Google’s TensorFlow and AWS’s machine learning as well as host of open source ML libraries give the impression that this is simply an easy add-on that any half decent software engineer can master. Even better, it’s free. There’s fringe advice to “go find the youngest person in your IT department and put them in charge of machine learning.” Well, to be frank, that’s about as smart as putting Big Head in charge of Hooli’s lab. The bottom line is if a good machine learning-based AI solution has the potential to increase revenue by 30% in a $1B company, should $300M be entrusted to someone who, while they may be a great coder, has no knowledge of the math? Machine learning is math. A lot of it.
The process is one of science — hypothesis driven, solid experimental design, deep understanding of the principles, a facile knowledge of the subtleties of statistical multi-dimensional analysis and innate skills for creative ways to test results against a control. Feature engineering can make or break whether an algorithm can find the patterns in the data and feature engineering is one where domain expertise is critical. This isn’t as simple as running a competition on Kaggle. Don’t expect that anything useful or accurate will come from an experiment that isn’t supported by an experienced data scientist while driven by someone with deep domain expertise and a clear view of the problem to be solved. Companies embarking on machine learning with a stated goal of leveraging technology for competitive advantage but lacking a clear view of the problem they are trying to solve will be lost in the wilderness of gradient descent, as will their algorithms.
There are many excellent startups as well as larger companies with specific machine learning-based applications. Many can demonstrate superior performance on specific use-cases with proprietary algorithms or with open source algorithms and excellent know-how. The ability to transform business is there — to interpret data and explain it in intuitive ways, to automate and simplify cataloging of text or images, to reduce risk dramatically in cyber security, to create new methods of customer micro-targeting and seeding of social networks, to enable innovative applications of IoT devices and transformational operational efficiency — but, as with all new strategic endeavors, executive understanding, sponsorship and buy in is the most important factor for success.