Published in

What Machine Learning Can Do. What Machine Learning Can’t Do.

The ubiquitous solution to mankind is not machine learning

First, and foremost, it is important to realize that ML is not omnipotent. As far as we know, from research over the past 50 years, we have come to realize that ML is a mode of knowledge acquisition without explicit programming that has some definite boundaries. Just like computation as a process has inherent limitations — for example, it is not possible now, nor will it ever be possible in the future, to decide if an arbitrary program will halt — ML also has intrinsic limitations that cannot be overcome by throwing more GPU machines at the problem, or using faster computers. I know this news might come as a disappointment to many fans of ML, but it is important to know that ML is not the salvation to all our problems. There are other ways to acquire information, and people incidentally use these all the time, in addition to “learning”.

Let us take as a primary example the problem of learning a language from hearing verbal utterances. This problem has attracted deep interest over 50 years or more in not only AI, but also in philosophy, linguistics, psychology, biology, and neuroscience, to name a few fields. Well, guess what? We still don’t understand how humans, by which I mean children as early as 2 years old, acquire their first language. There has been a tremendous amount of work on documenting the process, and of course, many theories. But, you can’t go to Best Buy today and buy a learning machine (e.g., say an Alexa) that will sit in your house and simply hear whatever language is being spoken in your house, and within a year or two, start conversing with you. Isn’t this sad? I mean, for all the millions of servers that Google, Amazon, Microsoft, and the big tech companies have at their resources, and the huge number of petabytes of storage capacity in data centers, we can’t solve this problem!

No, chatbots don’t learn language, and if you have ever used a chatbot, you’ll see easily in a minute or two why it can’t be the answer. Now, you probably have heard about the impressive power of deep learning solutions, like long short-term memory (LSTM) architectures, or generalized recurrent unit (GRU) architectures at doing tasks like language translation. Well, once again, these systems are far from being able to learn language, and even their performance at language translation is currently woefully bad compared to humans.

For an absolutely devastating takedown of Google Translate, I highly recommend the insightful article in Atlantic magazine by Doug Hofstadter, one of the deepest thinkers in AI and cognitive science, whose breakthrough book on “Godel, Escher, Bach: An Eternal Golden Braid” got me into AI in the first place: The Shallowness of Google Translate

Now, this is not to say that Google Translate is not extremely valuable and useful. Indeed, it is used everyday by millions of people worldwide, just as Alexa and its technological variants are used everyday by many people. But, systems like Google Translate are no match for humans, and one has to only see the many examples in Hofstadter’s article to see how far AI has to go in truly understanding language. LSTM and GRU architectures don’t “understand” language — they build simple statistical models that retain some information about past words, mostly at the sentence level, and they can easily be defeated, as Hofstadter’s revealing article shows so vividly. In one of the examples, GT is asked to translate a paragraph from English to French that has phrases like “he has his car, she has her car” and so on, illustrating how two people, a man and a woman, have divided their possessions in their house. GT entirely misses the point of the paragraph in its attempt to translate, and ends up not realizing the importance of gender.

So, what are the limitations of machine learning, using this example of learning language? There are principally two limitations, which are just inherent to the way ML is formulated now, and which cannot be overcome by throwing more data or compute power at the problem. Just as the halting problem is undecidable, now and forever, so are these two limitations of machine learning. This is why it is important to know such things, so that one realizes what one can and what one cannot do with machine learning. As the famous Chinese philosopher Confucius said a long time ago:

“A man who knows what he knows and knows what he does not know is one who truly knows”

The first limitation was proven in a famous theorem by Gold 50 years or so ago. It turns out that many studies have shown that children primarily receive positive examples of natural language. By and large, parents do not correct children’s mispronunciations or grammatically incorrect phrases, but instead interpret what the child is trying to say. So, unlike the somewhat idealized case of say supervised image labeling, where one gets images of faces and non-faces, children only get positive examples. Children also have no idea what the language they are supposed to learn is (if you are born in the US, you are not equipped with some magic “English learning” genes).

So, what Gold proved is this: no matter how many positive examples you see, a machine learning system can never infer a context-free grammar that generates the strings in the language. That is, assume you are given strings generated by some unknown context-free language. No matter how many strings you see, and how much compute power you have available, there will never come a time when you can say you have exactly identified the grammar that generated the language. This was truly a stunning result. Since Japanese and English and German and French are all more powerful than simply context-free languages, it must mean that the space of languages in our brains is not all context-free or all context-sensitive, but some other more restricted class that is purely identifiable from positive only examples. What is this class? Linguists have been looking for 50+ years, and haven’t found it yet, although there has been a lot of progress.

Now, for the second limitation, and this has to do with an inherent limitation of the two foundations of ML today, probability and statistics. Now, both of these mathematical areas are incredibly powerful and useful, not just in ML, but also in many other areas of science and engineering. It is hard to argue with the statement that Fisher’s work on randomized experiments and maximum likelihood estimation was one of the pinnacles of research in the 20th century, one that made many other things possible (e.g., reliable engineering of many technological artifacts, and drug testing etc.).

As Neyman, Pearson, Rubin, and most recently Pearl, have shown, however, statistical reasoning is inherently limited. Probability theory cannot be used to reveal the causal nature of the world. It cannot be used to learn that “lightning causes thunder”, not the other way around, or that “diseases cause symptoms”. Such an elementary bit of reasoning cannot be achieved by probability or statistics, or its derivative field, statistical ML. Once again, this is an inherent limitation, one that cannot be overcome by more data, more machines, and more money being thrown at the problem.

So, at the end of the day, one has to come to the realization that data science, despite all its promise and all its potential power, is not the end of the story. It will not be the miracle solution to the problem of AI, and to solve the problem of language learning and the problem of causal discovery from observations, one has to develop additional tools. Pearl and Rubin, for example, have developed just such extension of probability theory, such as potential outcome theory and the do-calculus operators. Pearl’s latest book on “Why?” is highly recommended. It has a three level cognitive architecture, with statistical modeling from observation at the lowest layer, causal reasoning with interventions at the middle layer, and imaginative reasoning with counterfactual at the top layer. This is one of the most interesting recent ideas on how to extend data science to what I call “imagination science”, a field that doesn’t yet exist, but one that I believe will become more popular over the coming decades as the limitations of data science become more obvious.

That is not to say that data science is not useful, it is in fact tremendously useful, and one can use it to model many phenomena, from social networks (we all know where this story leads to) to medical diseases and social problems like gun violence in schools. However, and this is crucial to understand, data science does not tell you how to solve these problems! Yes, gun violence in schools is an abhorrent stain on the otherwise marvelous educational environment in the US, and one can use data science and deep learning to construct elaborate models that summarize the incidents of gun violence. But, that’s not the real issue, is it? The real issue is intervention! How to reduce or eliminate gun violence? As Pearl argues, understanding interventions is not statistics. Probability distributions, by their very nature, do not contain within themselves some recipe that tells you how they change as you intervene in the world. We all know the interventions that are being proposed to reduce gun violence: ban the sales of assault rifles, better background checks on prospective gun buyers, equip teachers with guns (the US President appears to favor this intervention), and in fact, even repealing the 2nd amendment has been supported by one former US Supreme Court Justice. All of these are “interventions”: they will change the distribution of gun violence in some way. Which one is the most effective intervention? That is the real issue, and sadly, data science will not answer this question, since it requires causal models (layer 2 of Pearl’s cognitive architecture).

Understanding interventions is at the heart of not just reducing gun violence, but also many other problems facing society today. Take climate change. We can collect massive amounts of data on global warning and use deep learning to construct sophisticated models of CO2 emissions etc. But, again, what is the hard question here is what intervention is needed? Should we phase out gasoline powered cars and trucks entirely, and if so, at what rate? How much time will that buy us? There are scary looking predictions that look at what the map of the US will look like in 10,000 years (an imagination problem, of course!). This study was recently published in the New York Times:

Can You Guess What America Will Look Like in 10,000 Years? A Quiz

So, the consequences of global warming are indeed quite alarming and ultimately threaten our very survival as a species. So, the question is what to do about it? What interventions make the most sense, and how should they be implemented. Note this is not data science! When you intervene (say a city like Beijing or London decides to impose new traffic regulations and allow only even numbered license plates inside the city one day and odd-numbered plates the next day), you change the underlying data distribution from what it is currently, and so, all your previous data is useless!

So, causal models are absolutely needed to understand a vast array of social challenges that are going to become ever more pressing in the 21st century. If AI is going to contribute towards the betterment of society, its very effectiveness will depend on the ability to which researchers in the field understand the inherent limitations of the current most dominant paradigm, statistical ML, and why we as a field, and why, we as a society, need to move on to more powerful paradigms. Our very existence as a species may depend on developing the next AI paradigm that is more powerful than data science.

Also, the only thing that we have not yet succeeded in doing thanks to unsupervised automatic learning is the diversity of functions that AI could perform. Machine learning has also not succeeded in replacing all the jobs, understanding meaning from text and playing complex 3D video games.

It is quite possible to create an AI to beat a human in chess, or in video games, or even to diagnose patients, and perform surgical operations; but for the moment, they are only intended for one type of use defined by the developer / constructor. They are not yet able to learn something totally unknown and reproduce it, or even improve it (but we are almost there!.)

This topic was originally posted on Quora as a response to the question: “What machine learning can’t do!”



Our community publishes stories worth reading on data science methods and algorithms, artificial intelligence, machine learning, deep learning, and computer vision. Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tobiloba Adejumo

Interested in biomarker development, software dev and ai, as well as psychology, history, philosophy, relationships. Website: