Six graphs to understand the state of Artificial Intelligence academic research.
Let me start with something obvious to most people interested in innovation: this is the golden age of Artificial Intelligence, and it’s going to dominate the technology focus for many years to come. Or all the years to come, according to the supporters of the theory that once we’ll get general AI right, we won’t need to invent anything else (book n.1, book n.2, book n.3).
However, while seldom one day passes without TechCrunch, Bloomberg, Harvard BR etc. posting articles on how businesses are using AI, academic research is rarely analysed. But Scientific research is extremely important, both as a predictor of what business will do, and as a snapshot of where we are now.
I decided then to spend some time having fun with Science Direct’s API to get some data, and try to draw some conclusions. For the not-techies out there, Science Direct is “the world’s leading source for scientific, technical, and medical research”, and its API (or Application Programming Interface) is a set of functions that allows developers to query its database and get data.
In the rest of the article you’ll find the six most interesting graphs I drew from this analysis.
Graph #1: Most active research topics
Our goal here is to see which fields have been the most appealing to the application of AI. For this reason I purposely omitted the results on the subjects “Computer Science”, “Mathematics” and “Engineering”, obviously the leaders in research, with respectively 963274, 491191 and 352870 publications.
Biochemistry, Genetics and Molecular biology are by far the leaders here, with almost twice as many publications as the second highest researched subject, social sciences. If you didn’t expect that and have some doubts on what AI can do in this field, I strongly recommend you to take a look at Riccardo Sabatini’s TED talk here. Quoting him from the “Frontiers” conference at the white house, “the last century was about the atom, this century is going to be about biology”. And AI is confirmed to be the main weapon here.
Interestingly enough, on Pharmacology, Toxicology and Pharmaceutics the research falls short, probably this is going to be the next step once we crack the secrets of the human genome, and we’ll start looking at how to fix diseases.
It’s also interesting that Finance is pretty low on the list, while it’s well known that since quite some time there’s interest in applying AI to stock trading: probably it’s a topic related just to the industry and not to academia.
Graph #2: Research timeline
I chose some of the most interesting topics and techniques in AI and looked at their evolution over time. I didn’t include 2016 since being not over yet it could be misleading, indicating a false decreasing trend.
Notice how, even though the birth of the AI field is commonly set in 1956 with the Dartmouth conference, we basically don’t see any activity until 1980. This may be also due to the fact that maybe the publications of the time are not present in Science Direct’s database, but more probably the reason is in the difficulty of spreading knowledge at the time (read: no internet), and the researchers were a very small bunch of crazy visionaries.
It’s interesting how we can clearly see the effects of the second AI winter, from 1987 to approximately 1993, when the specialised AI hardware market collapsed. The most diffused AI application at the time where expert systems, with Lisp machines containing painfully encoded knowledge specific to a certain topic. They were expensive, and their application related to a certain domain. Their failure on the market slowed severely the AI hype train.
Due to lack of data, we can’t see much from the first AI winter in 1974, when a pessimistic report on AI caused a big cut in governments funds.
There are other two things on this graph that caught my attention. The first one is that recently there seem to be an inflection point, and the YoY growth of scientific publications is slowing down. Let’s take a look at graph #3.
Graph #3: Logistic function fitting for technological diffusion life-cycle
If you’re not familiar with the logistic function diffusion of technological innovation, there’s no problem. It’s a pretty simple concept that works very well in describing the evolution of several phenomena, like, in our case, the diffusion of a new technology. The logistic function has this classic “S” shape, and it basically says that an innovation has a first slow start, then once the foundations are well established the diffusion speed starts increasing rapidly, and then saturates once we get to a “mass” diffusion, or a new superior technology starts taking its place.
A clarification: since we’re talking about diffusion, the logistic function should be fitted to the cumulative amount of research, and not to the yearly amount of publications. We’re going to do that later, but since also the yearly publications seem to perfectly fit a logistic function, we can make some interesting thoughts about it.
It seems that the growth in YoY publication growth is pretty close to saturation, and we’ll go from the current 110k to a max of approximately 120k before it will eventually start decreasing and fading out.
This trend seems a little inconsistent with what we see every day: a growing interest in AI, reaching mass diffusion being deeply embedded in the vast amount of applications that we use every day.
But is a slowdown in research and an increase in business press coverage a contradiction? I think that it’s instead well aligned with the tech hype in businesses. With applications booming and companies racing towards AI supremacy in several fields (from web search, to speech recognition, to self-driving cars), AI experts are progressively being hired by big companies, leaving academia. This explains the slowdown in research growth, and viceversa: the maturity of AI research explains the diffusion of AI outside of universities, and in technology products and everyday life. So even if Stanford “complains” that they don’t have enough space in their halls to accomodate all the students interested in AI, those people are likely to be hired before they even think about writing a publication.
This may be scary, and in fact there’s who’s worried. AI can be an incredibly powerful tool, and it’s dangerous to leave it in the hands of a bunch of companies. That’s what can happen academic research becomes private research, and what organisations like openAI are trying to avoid:
“Our mission is to build safe AI, and ensure AI’s benefits are as widely and evenly distributed as possible”
Graph #4: Cumulative publications and logistic function fitting
Doesn’t it look like a perfect fit with a logistic function? From this graph we can see that at this moment we’re sitting on top of almost 1.2M scientific publications, and if the fit is correct, we just surpassed the inflection point. That means that the research should fade out when reaching a total of more or less 2M publications.
Does it mean we’re half way to complete knowledge of AI?
Nope, I really don’t think so.
What I believe it means is that, as said earlier, from an academic point of view we’re reaching maturity. Like there’s not much academic research going on now on traditional diesel engines because the technology is well established, with AI getting out from universities and inside companies R&D centers and everyday’s life we’ll see the same research trend. That doesn’t mean that car companies stopped building diesel cars, as technology companies won’t stop or slowdown building AI applications anytime soon.
Graph #5: in depth timeline
If you go deeper, deep learning is the fastest growing topic (sounds nice, doesn’t it?). Look at the slope of the curve, from basically not-existing in 2005, to the highest growth among all topics since 2013.
Almost half of the total research ever done on deep learning (44% exactly) had been produced just in 2014 and 2015. If we include 2016, even if it’s not over yet, we get to 55%.
It’s going to be interesting to see how the release of powerful opensource libraries like Google’s Tensorflow (November 2015) or the Microsoft Cognitive Toolkit (new release this week finally with Python support) will further boost this trend.
No big news here, we can see also that data mining is another field growing superfast, and that doesn’t seem to be slowing down. That makes sense, considering the amount of yet-to-be-tackled unstructured data that the worldwide web still offers.
Graph #6: Cumulative amount of scientific research
The question this last graph is trying to answer is: who is doing research? Unfortunately Science Direct lets us do this analysis for a limited number of articles, but we can still draw some interesting conclusions.
Here’s a list of the most active countries by percentage of contribution in world research:
- China: 16%
- India: 14%
- United States: 9%
- Iran: 8,4%
- Malaysia: 3,5%
- Italy: 3,3%
- Spain: 2,7%
- United Kingdom 2,5%
- Canada: 2,3%
- Japan: 2,2%
I was betting on China, India and US, but I wasn’t expecting to see Iran and Malaysia as forth and fifth. Especially Iran, considering the big gap between it and its immediate follower. I wasn’t expecting also to see my country, Italy, leading the group of EU countries. Tech recruiters out there: are you hunting for talents in the right spots?
I was also expecting to see US on top of the list with the crazy Silicon Valley hype, but probably seeing US just third confirms what I was hypothesising earlier: Silicon Valley is sucking AI experts from Universities, making research material a privately held asset that rarely borns inside Universities.
AI research is very strong, and a very recent field, but from an academic standpoint seems to have reached maturity already, with a decrease in new researchers joining the field each year, probably in favour of tech companies’ challenges and crazy salaries.
Would you blame them?
If you liked my work, please recommend and share this article, I would really appreciate it :)
Also, If you want to have fun with Science Direct’s API, I published the code I used on Github.