AI — Crossing the Chasm from Research to Real life
“A paradigm starts to crumble when there is a heightened insecurity about the capacity of the paradigm to solve the puzzles it has set for itself. Practitioners keep getting the ‘wrong’ answers. The paradigm is in crisis mode, and it is at such points that breakthroughs to a new paradigm are possible” — Thomas Kuhn
There have been many such instances in science wherein an established and revered paradigm starts to show cracks which cannot be papered over and the paradigm either topples over totally getting replaced by a newer paradigm or gets considerably revamped and strengthened by a newer paradigm.

Scientific community clung to Ptolemey’s geo-centric model for centuries inspite of its inability to explain the planetary motions observed. Though the stakes for its demise were laid by Copernicus, it needed Galileo and his telescope to topple the geocentric model with the newer heliocentric model. Newtonian mechanics went through similar turbulent times, before the scientific community could come up with the Special theory of relativity which could extend and revamp classical mechanics. We are at a similar stage of crisis in the current trajectory of evolution of artificial intelligence.

Over the last few years, the progress in AI driven to a considerable extent by deep learning and neural networks, has been explosive, leading to a cult status for the discipline. Just to cite a few examples:
- Driverless cars are on the road without any fallback human driver on-board.
- AI-based medical diagnosis software is performing better than certified doctors which is now recognized and published in leading medical journals.
- Conversational agents are becoming more and more pervasive in all spheres of lives.
- AI-enabled agents are beating world champions in various complex games.
- Riding on advances in basic research and such practical demonstrations in real-life settings, Artificial General Intelligence (AGI) seems within reach of mankind soon.
However for many of the above mentioned successes of AI and deep learning, there have been quite a number of adverse stories that have surfaced of late:
- Accidents with driver-less cars have been reported, with human casuality .
- In the field medical diagnosis with AI, Andrew Ng’s tweet on about AI being ready to take over radiologists job, was greeted with wide criticism by the experts, who pointed out that jumping to such conclusions is very much premature. A detailed and balanced critique of the CheXnet paper can be found here.
- While chat-bots were touted as a shining example of general purpose AI application a few years back, their success has been limited to very narrow domains. In last year’s Alexa Prize, no competitor could hold an intelligent conversation with a human that can last the specified interval of 20 minutes. Facebook quietly buried its virtual assistant M. There have been security concerns voiced with some of the current conversational voice assistants. While Google Duplex was announced with a lot of fan-fare, its applications are currently limited to a very narrow domain. As Gary Marcus pointed out in an op-ed piece in NewYork Times, “the limitations of these applications as a form of intelligence should be a wake-up call. If machine learning and big data can’t get us any further than a restaurant reservation, even in the hands of the world’s most capable A.I. company, it is time to reconsider that strategy.”
- Though AI powered solutions have shown remarkable success in beating human champions in games such as Go, Atari, there have been questions over these systems not being purely machine learning based, but a mixture of human knowledge embedded into the system along with deep learning components. There have also been questions over the ability of AI and DL systems to perform well in scenarios such as decision making, planning, or learning from a few examples (zero shot and one shot learning).
Many of these questions and concerns have in fact been raised not by those outside the AI research community, but rather by some of the leading voices from within the field itself:
“Deep neural networks have a tendency to learn surface statistical regularities as opposed to high level abstractions. Their performance is possibly due to their picking up the superficial statistical cues present in both the train and test data.” — Jo & Benjio, 2017.
“70.97% of the natural images can be misclassified by DNNs by modifying just one pixel with 97.47% confidence on average” — Su et al. 2017.
“Many of the models in reading comprehension test fail with simple adversarial examples, because they can’t distinguish a sentence which answers the question, from one which merely contains words common with it”. — Jia and Liang, 2017.
“Current supervised perception and reinforcement learning algorithms require lots of data, are terrible at planning, and are only doing straightforward pattern recognition. By contrast, humans learn from very few examples, can do very long-term planning, and are capable of forming abstract models of a situation and [manipulating] these models to achieve extreme generalization.” — Francois Chollet
“somehow everything we do in deep learning is memorization (interpolation, pattern recognition, etc.) instead of thinking (extrapolation, induction, etc.). I haven’t seen a single compelling example of a neural network that I would say thinks” — Andrej Karpathy
“We are seeing lots of papers, these days, that say, I did it, I did it, I did it, but I don’t know how! Now even the author of the paper doesn’t know how the machine does what it does.” — Kenneth Church.
“All of machine learning is nothing but glorified curve fitting” — Judea Pearl.
“If machine learning and big data can’t get us any further than a restaurant reservation, even in the hands of the world’s most capable A.I. company, it is time to reconsider that strategy.” — Gary Marcus
“Machine Learning has become alchemy.” — Ali Rahimi

Social media has been peppered with arguments between those who claim that current AI and deep learning approaches are fundamentally sound, vs those who sharply disagree. This is not a debate that is happening on the fringe, Instead it has become a sharp dividing line among the mainstream AI researchers. No one is disputing the fact that current AI and deep learning techniques have clearly demonstrated notable successes in specific real life applications be it computer vision, speech and natural language processing. But can the current AI and deep learning approaches take us from AI being a cool technology to real life presence in human life?
AI applications which directly impact human life and safety such as driverless cars and AI driven clinical decision-making systems, are now starting to emerge. Deploying AI at scale in such complex scenarios also brings with it, certain key requirements. For instance, consider the task of clinical decision-making. It is expected that,
- The human doctors can explain the rationale behind their decisions when needed (interpretability),
- They will make the right decisions for the patient, even if the certain symptoms of the patient are deliberately trying to mislead them (adversarial vulnerability),
- Their clinical decisions will not change due to changes in hospital setup (anti-fragility)
- They can still make the right decisions if they shift their practice from Ohio to California (transferability).
- Other doctors with similar expertise can replicate their decisions when presented with the same evidence (reproducibility).
- The human doctors will make the right decision for all patients, irrespective of their inherent characteristics (fairness and bias free).
We expect similar high standards in AI applications operating under such scenarios. Unfortunately, the current state of AI driven applications are far from this desired state. While almost all new technologies do have sufficiently big shortcomings which they overcome over the course of time, the key question is this:
Are these shortcomings so fundamental that the current crop of deep learning and AI techniques cannot be the path forward in the transition of AI from research to real life?
In this series of blog posts, we will examine the various points of view out there in the AI research community. We will discuss in-depth many of the challenges that current AI and DL techniques face, and see whether it is an inherent and fundamental flaw which needs a paradigm shift. While some of these challenges are inherent to AI such as interpretability or adversarial vulnerability, challenges such as reproducibility are common to other scientific disciplines such as medicine and psychology. We will also look at what learnings we can leverage from related disciplines who have made the journey from research to real life successfully such as medicine and psychology. For instance, can we adapt “Registered Reports” from psychology to help address the reproducibility issue or leverage a modified form of phased trials approach from medicine to ensure successful deployment of AI applications for replacing human decision making? Our focus will be on covering the challenges AI faces as it attempts to make the transition from research to real life.
We are also organizing a debate style workshop at IJCAI-ICML conference on the topic “Is Deep Learning hitting a wall?” with Tim Hwang (@timhwang).
No matter which side of the argument you are on, whether you are a AI/DL fan or AI/DL skeptic, hang around and share your point of view.

This is the first in a series of short blog posts discussing the challenges AI faces in crossing the chasm from research to real life. These posts are based on our IJCAI paper titled “Evolution of AI from Research to Real Life — Some Challenges and Suggestions” by Sandya Mannarswamy and Shourya Roy. Shourya Roy is Vice President & Head of Big Data Labs, American Express. Sandya Mannarswamy is a Research Scientist at Conduent Labs India.