Is Current Progress in Artificial Intelligence Exponential?

DiggingDeepAmidstChaos
12 min readMay 4, 2020

--

Many people claim current technological progress as happening at a faster and faster pace (exponential even), with no end in sight. The merits and detriments of technology can be argued ad nauseum, but I won’t be getting into that in this post (I generally view technology itself as neutral — it can be used to improve human life or terribly misused to oppress, control, and kill). What I am going to briefly explore here is the question: is current progress in AI exponential? And if so, what implications does that have for estimates on the arrival of human level or superhuman level AI?

Before I dive in, it’s worth asking (if you didn’t study mathematics): why does it matter if something is changing exponentially? Frequently people think the word “exponential” means “really fast”, which is sometimes true, but doesn’t capture much of the meaning of the concept.

Exponential change can be either negative or positive. Negative exponential change means that something is decreasing at an ever slower rate. Positive exponential change, on the other hand, refers to something increasing at an ever faster rate. We’re interested in positive exponential change in this post.

It seems like a simple concept, but people in general (including mathematicians and physicists!) have trouble extrapolating when something is changing in an exponential fashion. What we frequently do is extrapolate based on how things are changing right now. This is a huge problem because it means we end up massively under-predicting how something will change. Check out the graph/illustration below, which I stole from waitbutwhy.com:

The chart above is moreso for the change in technological progress over time, rather than “human progress”, in my opinion, but the idea of the graphic is highly illustrative of what I was describing above.

For an excellent explanation of exponential growth and the destructive implications our misunderstanding of it has for the coming century, I highly recommend “Arithmetic, Population, and Energy — a talk by Al Bartlett” on YouTube. The facts and figures are a bit out of date, but it is still very much worth a watch. tldr: “Sustainable growth” on a finite planet is a contradiction.

Exponential Development of Computing Power

To give you a concrete example of an exponential rate of technological progress, let’s take the famous example in computing called “Moore’s Law”:

“Moore’s law is the observation that the number of transistors in a dense integrated circuit doubles approximately every two years. The observation is named after Gordon Moore, the co-founder of Fairchild Semiconductor and Intel, whose 1965 paper described a doubling every year in the number of components per integrated circuit,[2] and projected this rate of growth would continue for at least another decade.[3] In 1975,[4] looking forward to the next decade,[5] he revised the forecast to doubling every two years.[6][7][8] The period is often quoted as 18 months because of Intel executive David House, who predicted that chip performance would double every 18 months (being a combination of the effect of more transistors and the transistors being faster).”

-Taken from Wikipedia, for convenience.

This law has held for Central Processing Units (CPUs) in computers for many decades and was actually used by Intel and AMD to set the pace of their development. For better or for worse, transistor sizes in CPUs are hitting a brick wall right now because it is hard to make them as a small as single atoms. Thankfully (or unfortunately, if you don’t like new technology) the general idea behind Moore’s law (ever faster/efficient/dense computing for a given price) will likely continue on in electronics for quite some time through more efficient architectures, using more processing cores, and also by further specialising computer hardware for specific tasks.

Particularly remarkable right now is the massive improvement in processing power that can be fit into graphics cards (they’re mostly used for video games and watching YouTube videos of cats). Graphics cards (also called GPUs — Graphics Processing Units), which basically perform parallelized operations really efficiently, are seeing widespread use in machine learning / AI research. More specialised AI cards, such as Tensor Processing Units (invented by Google) are also being put to use. An example application that you may have used is Google Translate, powered by a neural network algorithm (as of 2016/2017), which for each language pair was trained using many GPUs/TPUs in parallel.

The below graph illustrates very well the incredible improvements in computing power over the last 120 years. Worth noting is that it is a “logarithmic” scale, so a small increase in vertical position on the graph corresponds to a very large increase in computing power.

Up-to-date graph of Moore’s Law based on a figure by the futurist Kurzweil. [Steve Jurvetson, 10 Dec. 2016 Flickr]

At the very end of this graph is NVIDIA’s Titan X GPU, which is popular both in high-end gaming and AI research. About three or four of these graphics cards can be slotted into a well-built desktop computer. A Titan X costs about $1000 USD, so it’s quite pricey for the average Joe, but each one of these graphics cards/chips can in theory deliver about 12 trillion floating point operations per second (TFLOPS). For anyone who’s been using a computer for more than 5 years that’s a truly absurd number.

To having something to compare against, consider the fastest supercomputer on Earth in the year 2000, the IBM ASCI White, which had a theoretical processing speed of 12.3 TFLOPS. That computer weighed 106 tons, consumed 6 megawatts of electricity, and cost $110 million ($156 million in 2017 USD), whereas the Titan X weights about 3 pounds (a 71,000-fold decrease), only consumes 600W of power (a 10,000-fold decrease), and “only” costs $1000 USD (a 156,000-fold decrease). Fitting the computing power of a 2000-era supercomputer on a single chip in the year 2017 is an incredible change and there is no sign of the pace of improvement slowing down any time soon.

Note added after original posting: Comparing the IBM ASCI White and the Titan X GPU is a bit like comparing apples to oranges since they perform different types of computations. I will leave the comparison in since at the very least it still does roughly illustrate the general point I was making.

Implications of Exponential Growth in Brute-Force Computing Power for AI

Directly comparing “computational power” of a CPU or GPU to that of a brain is a pretty iffy exercise because of the vast differences in how they “compute” information, but the general graph is still useful. For reference, the 2017 Titan X GPU (about $1000) performs ~10¹³ calculations per second. [This image is originally from “The Singularity is Near”, by Ray Kurzweil]

The most obvious improvement that the exponential growth of computer power is that it allows for dramatic improvements in “brute-force” solutions to problems. This is great in simple applications, such as the games tic-tac-toe or checkers (maybe someday chess?), where you can run through all possible moves and all possible future moves and just choose to make the correct move. For more complex problems you need a more clever strategy.

For example, in 2016, Google Deepmind’s “AlphaGo” AI algorithm was able to beat the best player in the world at the game of Go, using a combination of machine learning (neural networks), tree search techniques, and a ton of training. This incredible feat was accomplished less than 20 years after a similar event in the chess-world, with Kasparov’s defeat by IBM’s “Deep Blue” computing system. AlphaGo’s performance is so spectacular because the number of possible board combinations in Go is far larger than that for chess — about a “googol” (1 followed by 100 zeros) times more complex. For contrast, the increase in computing power on a chip over the last 60 years is only something like a 1 trillion-fold increase. AlphaGo is definitely a clever algorithm!

AlphaGo is quite impressive, but it still requires an incredible amount of training (millions of games), which is far less efficient than human learning. As well, the algorithm’s focus is extremely narrow, although it is being adapted for other uses now, such as optimising electricity usage in Google’s data servers. A general learning algorithm will need to be much better at learning and understanding the world than something like AlphaGo, but we are going in that direction quite quickly.

Not only does increasing computing power make more and more clever algorithms feasible/possible (for example, in the case of neural networks, you can use increasing numbers of neurons) and allow usage of ever-larger training datasets, but it also rapidly speeds up development time for new algorithms.

A big slowdown when writing a computationally intensive machine learning algorithm can occur when there is a large stretch of time between starting the training and getting feedback on how well it generalises to unseen data. Even a few seconds waiting to see if something works can be frustrating when trying to keep your train of thought while debugging some code. This is of course worse the larger the dataset is.

Here are a few ways AI/machine learning programmers can easily lose development time:

  • Programmers aren’t perfect! Getting the code error-free tends to take some trial-and-error. More waiting time equals more time to get the bugs out of new code.
  • Tweaking the various parameters (eg: neural networks can have their architectures adjusted in many ways) to optimise the learning algorithm for the task at hand requires re-running the algorithm many times on the dataset. For an extreme example, when Google first started deploying neural networks for Google Translate in 2016, it took about a month of training for each language pair.
  • The data needs to be re-processed or more data needs to be collected. Fiddling around with and cleaning data is notoriously time-consuming.

Exponentially more powerful computational power not only unlocks the ability for more powerful (eg: more neurons/brute-force ability) and intelligent (“better”) algorithms, but also results in an increasingly fast pace of software development. Since this allows for new ideas (and new combinations of well-understood architectures/algorithms!) to be tried out more quickly, we can and will develop useful pieces that can be incorporated into a general intelligence algorithm at an ever faster rate. Entire software packages/libraries for various neural network architectures (and machine learning algorithms in general) specialised for different tasks already exist and are constantly improving and growing in number and ease-of-use.

For a fun read to better understand the exponential nature of improvements in computing power and implications for AI, I highly recommend reading part one of Wait But Why’s “The AI Revolution: The Road to Superintelligence“, by Tim Urban.

Ever-Faster Development of Building Blocks for Creating General AI

I’ll use “blocks” here as a fast and loose word/term for computationally-defined concepts or methods/algorithms. These “blocks” can have overlapping/redundant purposes/use. For example, I would consider a neural network for understanding semantics of text a building block. A method for developing short term goals to reach an abstract long term goal could be another block (likely made up of smaller blocks).

Also, to avoid confusion, I think it’s worth defining the difference between artificial intelligence and artificial general intelligence:

Artificial Intelligence (A.I.): any algorithm that perceives its environment and takes actions that maximise its chance of success at some goal.

Artificial General Intelligence (A.G.I.): the intelligence of a machine that could successfully perform any intellectual task that a human being can.

-Once again, based on the Wikipedia definitions, because convenience.

The steady accumulation of AI building “blocks” can feel quite slow when you’re a researcher spending months ripping your hair out trying to debug your code and get it to do what seems like one simple thing. But this feeling can be misleading as to the rate of progress in AI research in the bigger scheme of things and for how close we are to sudden “jumps” in improvements in AI capabilities (an example of a jump in improvement was in the 2011 with the sudden viability of deep-learning for a large variety of tasks, such as image classification or voice transcription).

Think of building up a hierarchy of necessary AI concepts/modules by doing basic research. Most basic research will be steady improvements or refinements of already existing blocks, making them easier and easier to use and implement in new contexts. At some point, a given block will reach a threshold, where it is so well developed and easy to use that a programmer can basically use it in a plug and play manner. This allows the blocks to be put together with other blocks with relative ease.

That’s all well and good, but steadily adding blocks of code together doesn’t sound very exponential, does it?

Creating a general AI that has human-level abilities or better will like require some kind of semi-hierarchical implementation / usage of these modules, just like the brain does. In fact, the brain has very highly specialised regions for things like image recognition and image processing (in the occipital lobe), auditory processing, and many others. Each of these specialised regions is also (more or less) broken down into further specialised chunks for processing and interpreting information. How these specialised regions are synthesised together into the brain’s decision making processes is not well understood. There are a lot of different systems and processes in the brain and how these go together and what is or isn’t essential to a general AI algorithm is even less understood.

A possible real-world example of some AI “building blocks” as they might be put together in the brain for representing abstract knowledge/understanding. [Marvin Minsky, 2008]

Figuring out how to create and implement basic algorithmic machine learning blocks is much easier than the more conceptually difficult things, such as imagination, setting abstract goals, or usage of memory. Since the basic blocks are easier to figure out and create, we’ll steadily be increasing the number of these blocks, which are the bottom of our messy hierarchy of pieces that can be used for a general AI. Less frequently will be the development of “higher-level” or more complex blocks, which often may make use of our simplest building blocks.

As we develop potential basic blocks at a faster and faster pace as we improve our understanding of these algorithms and improve our ability to use them, it becomes easier and easier to create higher-level and/or more complex algorithms. Each higher-level block leads to correspondingly larger improvements in generalisation and learning performance. To me this indicates that developments in AI will be exponential in nature, as long as we aren’t bottled-necked by a sudden decrease in the rate of improvement of computing power.

So, to me at least, it looks like the development of AI technology is exponential in nature. Although it’s really impossible to say what and how many building blocks we’ll need (technological problems are frequently more nuanced and challenging to solve than appear at first glance), the accelerating pace of development likely means general AI will be here sooner than we (and possibly many of the AI researchers) predict. Computers will be doing more and more things that it was assumed only humans can do. Even if the pace of developing these AI building blocks flat-lines (there are possible hints this may be happening in academia, but that may also because of migration of researchers to large tech companies which put more emphasis on commercial applications than research publications), because of the massive undercurrent of computing power increases I expect higher-level building blocks to continue to be developed at a pace that results in exponential improvements in AI algorithm generalisation and learning ability.

One last development worth mentioning here, is that of self-improving machine learning algorithms. These are algorithms which can recursively self-improve. In fact, the human brain does this to some extent: we can think about how we learn and choose to improve how we learn via meta-cognition. Once we develop very effective and advanced methods (or “blocks”) for self-improving AI, it may be possible that the pace of AI progress could go from exponential to super-exponential as the AIs develop ever better versions of themselves at an ever faster pace (see chapter 4 of Superintelligence: Paths, Dangers, Strategies, by Nick Bostrom, for better elaboration on this). This super-exponential rate will/can probably only happen once we are close to developing or already have developed general AI, but it is worth noting that we do have some rudimentary self-improving AI systems already.

Still skeptical about the pace of progress in artificial intelligence research? Even if progress towards general AI were linear, the current rate of research and development alone is astounding. It won’t be here tomorrow for sure, but just wait 5, 10, 20, or 30 years for some incredible developments toward it. I think you just might find yourself surprised.

If you’re interested in AI experts’ predictions for when general AI will be developed, I recommend checking out this summary of results from the MIRI AI Predictions Dataset. In particular, based on Figure 3: “Fraction of late minPY predictions (made since 2000) which say AI will have arrived, over time”, it looks like a guess of sometime around 2040 (the median guess) would be a reasonable conclusion. Significantly different dates for the development of human-level general AI (my personal best guess would be early/mid 2030’s), either earlier or later can easily be argued as well — no one knows.

Originally posted on the original WordPress version of this blog in August 2017.

--

--