Deep Learning Pioneer Geoff Hinton Helps Shape Google’s Drive To Put AI Everywhere
Series on Artificial Intelligence
This is the eight interview in our series on Artificial Intelligence. An unabridged audio version of this interview, as well as previous interviews in this series, are available below:
This article was originally published on Forbes.
- Neil Jacobstein: AI & Robotics Chair @ Singularity University
- Oren Etzioni: CEO @ Allen Institute for Artificial Intelligence
- Greg Brockman: Co-Founder & CTO @ OpenAI
- Scott Phoenix: Co-Founder @ Vicarious
- Antoine Blondeau: CEO @ Sentient Technologies
- Mike Rhodin: SVP @ IBM Watson
- Sebastian Thrun: CEO @ Udacity
Introduction
Artificial intelligence (AI) is a white hot topic today as judged by the amount of capital being put behind it, the number of smart people who are choosing it as an area of emphasis, and the number of leading technology companies that are making AI the central nervous system of their strategic plans. Witness Google’s CEO’s plan to put AI “everywhere.”
There are some estimates that five percent of all AI talent within the private sector are currently employed by Google. Perhaps no on among that rich talent pool has as deep a set of perspectives as Geoff Hinton. He has been involved in AI research since the early 1970s, which means he got involved before the field was really defined. He also did so before the confluence of talent, capital, bandwidth, and unstructured data in need of structuring came together to put AI at the center of the innovation roadmap in Silicon Valley and beyond.
A British born academic, Hinton is considered a pioneer in the branch of machine learning referred to as deep learning. As he mentions in my extended interview with him, we are on the cusp of some transformative innovation in the field of AI, and as someone who splits his time between Google and his post a the University of Toronto, he personifies the value at the intersection between the research and theory and the practice of AI.
Interview
Peter High: Your bio at the University of Toronto notes that your aim is to discover a learning procedure that is efficient at finding complex structure in large, high dimensional data sets, and to show that this is how the brain learns to see. I wonder if you can talk a little bit about that and about what you are working on day to day as the Emeritus University Professor at the University of Toronto as well as a Distinguished Researcher at Google today.
Geoffrey Hinton: The brain is clearly very good at taking very high dimensional data, like the information that comes along the optic nerve is a million weights changing quite fast with time, and making sense of it. It makes a lot of sense of it in that when we get visual input we typically get the correct interpretation. We cannot see an elephant when there is really a dog there. Occasionally in the psychology lab things go wrong, but basically we are very good at figuring out what out there in the world gave rise to this very high dimensional input. After we have done a lot of learning, we get it right more or less every time. That is a very impressive ability that computers do not have. We are getting closer. But it is very different from, for example, what goes on in statistics where you have low dimensional data and not much training data, and you try a small model that does not have too many parameters.
The thing that fascinates me about the brain is that it has hugely more parameters than it has training data. So it is very unlike the neural nets that are currently being very successful. What is happening at present is we have neural nets with millions of weights and we train them on millions of training examples and they do very well. Sometimes billions of weights and billions of examples. But we typically do not have hugely more parameters than training data, and that is not true with the brain. The brain has about ten thousand parameters for every second of experience. We do not really have much experience about how systems like that work or how to make them be so good at finding structure in data.
Where would you say we are on the continuum of developing true artificial intelligence?
I think we have crossed a very important threshold. Until fairly recently, most people in AI were doing a kind of AI that was inspired by logic. The paradigm for intelligence was logical reasoning and the idea of what an internal representation would look like was it would be some kind of symbolic structure. That has completely changed with these big neural nets. We now think of internal representation as great big vectors and we do not think of logic as the paradigm for how to get things to work. We just think you can have these great big neural nets that learn, and so, instead of programming, you are just going to get them to learn everything. For many, many years, people in AI thought that was just fantasy.
Here is an example of something people would have said outright was insane: taking strings of English words and corresponding strings of French words that are their translations and, with enough such pairs, you could train a great big neural network so that if you gave it a new English sentence it could translate it into reasonable quality French. Now, we are still not perfect at that, but already neural nets are comparable with state of the art and are progressing much faster, and so pretty soon they will be used in practice, I believe. People twenty or thirty years ago would have said that whole idea was completely crazy. Of course, you have to program in lots of knowledge about linguistics and about how the world is, and so on. The idea that a relatively dumb, simple learning algorithm could just learn it all from data with no real knowledge about language built in would have seemed completely ridiculous. Now, for people who have thought about the brain, it was not nearly so ridiculous because that is basically what the brain has got to do. But, like I say, AI has cut across that threshold now. Most people in AI, particularly the younger ones, now believe that if you want a system that has a lot of knowledge in, like an amount of knowledge that would take millions of bits to quantify, the only way to get a good system with all that knowledge in it is to make it learn it. You are not going to be able to put it in by hand.
The way you have put it in the past is that you have gone from the lunatic fringe to the lunatic core as technology has caught up and as progress has become demonstrated in so many different and interesting ways, like the ways you just described.
Yes. Many of us had the belief for many years that if we had powerful enough computation and if we had enough data, then our techniques would eventually work. We have reached the point where that is true now. Now, our techniques scale up: you make computation more powerful, we can make you better models; you give us bigger data sets, we can make you better models. That is not true if you program everything. So this stuff scales better than AI ever did in the past.
It is amazing to think that you began studying this in the 1970’s. You have a PhD from 1978 in Artificial Intelligence from the University of Edinburgh, and faced challenges with regard to the state of computer technology to aid you in your goals, to say nothing of naysayers among those who might have be your collaborators, your source of funds, or help you in some other way. I wonder if the twenty-somethings who are getting into this truly recognize that they stand on the shoulders of people who were true visionaries and went through all sorts of obstacles. Can you talk about what inspired you in the face of these significant challenges to not only pursue a PhD in the course, but also then to make your career in this topic when, certainly relative to today, the chances of success were nowhere near as concrete as they appear to be at this point?
I guess it all boils down to this: the brain has to work somehow, and it is a really big puzzle how the brain manages to learn things, and how the brain manages to use all those slow neurons to compute fancy things. My main motivation was always that in the brain we have a clear demonstration that you can get intelligence in a way that is quite unlike what is happening in a digital computer. In particular, nobody is programming it.
I remember the second project I did as a grad student in Edinburgh in 1973 was people explaining to me how neural nets were passé, and there was no chance of them working. They also said, “of course, neural nets cannot do recursion,” and, at that time, recursion was regarded as the essence of intelligence. It seemed to me that I had to show how neural nets could do recursion to make that argument. So I set about showing how you could do true recursion in a neural net. What I mean by “true recursion” is the knowledge about how to do something in a net using the connection strengths. If we process a sentence like “John did not like Bill because he was rude to Mary” the “he was rude to Mary” is an embedded sentence and I have to be using exactly the same connections and neurons for processing “he was rude to Mary” as I am using for processing the whole sentence. Presumably, what I have to do is store, somehow, what I have processed of the sentence so far, go off and process the embedded sentence, and then come back and integrate what I have got from the embedded sentence with what was stored. Until you can make a neural network that can do that, obviously you could not then begin to make these things general purpose. So I set about making a neural network that could do true recursion, and it did it by having temporary weights between neurons that could do the storage you need I remember giving a talk about this to a research group I was in and people had no idea why I wanted to do this. What is interesting was I was tackling a problem that has just recently become fashionable again. It became fashionable a couple of years ago to determine how you can really do recursion in your own, so it took about forty years for people to see that that was a problem we needed to solve.
You founded the Neural Computation and Adaptive Perception Program [NCAP] in 2004 and brought together computer scientists, biologists, electrical engineers, neuroscientists, physicists, psychologists. This brought to mind the fact that what you for so long have been attempting and working on and making progress against requires expertise in a variety of disciplines in order to replicate the experiences of the human mind. Could you talk about the need for that collaboration and finding those world class thinkers across different areas of expertise to bring to life the topics that we are describing here?
No one person has to be expert in all those areas. It is too difficult. What you need is an expert in an area who also understands what the main project is so that you can ask them a question and get the answer and save yourself reading through the literature. If you have an expert in neuroscience, you can ask questions like “When there is a forward projection from one area to the next area, and then there is a backward projection, how many neurons are there in that loop? How many synapses does the information have to go through before it gets back to where it started?” That is the kind of thing that will take you a while reading the literature, and also you do not know which papers you can trust. If you have an expert, they can just tell you there and then. It is very useful having very good people who understand the project because you can save yourself a lot of time and they can tell you from their perspective what is silly and what is not.
The NCAP program was “invitation only” and so represented people who presumably you and perhaps a cadre of others knew and also knew would work well in this sort of a group setting and, as you point out, would provide some of those shortcuts to knowledge, but also be willing to work with people who were not in their own field. Can you talk about the process of building that network as it applied to that specific case, but also as you continue to do so at the University of Toronto and with the work that you do professionally?
To begin with, when we set up the NCAP Program it was fairly straightforward. I just thought of all the people I knew, all the smart people who are good at interacting, and tried to get them in the program. There were three conditions: you had to be smart; you had to be good at interacting; and you had to be interested in how the brain does computation. Since I had already been doing this for many years, I knew quite a few such people and we happened to get a mix that worked pretty well.
Fast forward from 2004: the computational power is going up dramatically and the ability to capitalize on some of the ideas are coming much more rapidly as a result. What would you say was the fruit of your labor during your time at NCAP?
Quite a few different things came out of NCAP. It was not just neural nets; people were doing lots of other things as well in perception and motor control. The thing that had the most recognition in the long run was the deep neural nets. Around 2004, there was a general belief that it was going to be very hard to train neural nets to have more than a few layers and most people at NCAP thought it was going to be very hard to translate into neural networks using purely supervised training. So what happened historically was kind of weird, which was we then focused on unsupervised training: how you can learn one layer at a time without yet knowing what the right output for the net is. Each layer is trying to model the structure and the data in the layer below. That was the breakthrough that really got deep learning going again: the fact that by this pre-training we could make it easy to learn deep nets.
What then happened was people discovered that with enough data and enough computer power, it was fairly easy to learn these deep nets without this pre-training. For many cases where you have a lot of data, like speech and many vision problems, people do not bother to use pre-training anymore. But it was the pre-training that was the catalyst to get deep nets working again. Once we knew they could be made to work, then we discovered they could be made to work without this pre-training. I think that exploration of how to get unsupervised learning to learn deep nets was a kind of common theme of a bunch of different researchers and I think that is what came out of the early stages of the NCAP Program.
Having had a chance to speak now with a number of leaders in the AI space, what is fascinating to me is the number of relatively new organizations that are taking very long time horizons, thinking over the long term in terms of developing their strategy, choosing funders who will not expect to have a major economic event. Many of them are being set up as non-profits so that the science can really be the focus, as opposed to the profit motive. It seems important and fortuitous that a good number of bright minds are thinking over the long term as opposed to the shorter term of the typical for-profit organization. Do you have some perspectives on that, Geoff?
Yes, I have various perspectives. Obviously, you need both the applications and the basic science. The reason there is so much interest in neural nets right now is not really the theory of them. It is because they work. So the applications in speech recognition, or object recognition, or machine translation are all very impressive and that is what makes people interested in providing more funding for the basic research arm.
I think it is a bit more complicated than “profit motive” versus “non-profit motive.” For example, Google provides money to universities for basic research, and that is very helpful. It is very important to big companies, for example, that the universities keep producing well-educated graduate students. So actually the big companies have motivation for supporting university based research, which is someone has got to educate their graduate students. So it is a bit more subtle.
Within universities there has been a lot of political pressure, both in Britain and in Canada and the U.S., to make the research more relevant and more applied. It is very easy for a politician to say “We are putting this money to research because it is going to pay off and provide jobs in the next five years, or provide profits in the next five years.” That is not the right way to do basic research. The really big payoffs come from curiosity-driven research. So funding that is directed at applications within universities seems, to me, a mistake. I think the universities should be doing the basic research and the companies should be applying it and this tendency to try and make universities’ researchers’ research be more applied is not a good thing, is not sensible in scientific terms. It is just a result of politicians and a few senior scientific administrators thinking that is an easy sell.
You now bridge the gap between the academic environment and the corporate environment, spending time both at the University of Toronto, where you have been for a number of years, and Google as well. I read that you divide your time, roughly 9:30–1:30 at the University of Toronto and 2:00–6:00 at Google offices, with some stretches of time in Mountain View?? as well for more dedicated time with that corporation. I wonder now, with a foot in both of those worlds, how much of a crossover is there between those, and how much is it truly differentiated?
I have a funny role at Google, which is I am not that involved in any particular application. I have been doing neural nets for a long time, so I have seen lots of ideas come and go. Many ideas that were not successful over the last thirty years were not successful because computers were not fast enough. I keep the people in the brain team at Google informed about old ideas that might be relevant to what they are doing and give them intuitions, developed over many years, about good approaches. Basically what I am still doing now is suggesting to people doing applications ways in which basic research ideas might be applied. That is very different from many people at Google who are focusing heavily on getting particular applications to work. However, at DeepMind, there are lots of people working on research ideas, and there are people on the brain team whose primary interest is basic research and developing new algorithms for neural nets.
Geoff, you mentioned DeepMind, which was in the news fairly recently with regard to AlphaGo and that platform defeating the Go champion. Could you talk about the importance of that, as well as the importance of these kinds of contests both to mark the progress that is being made, spark the imagination of people who will go into these studies, and perhaps launch more people down a path towards starting companies in this realm as well?
One interesting thing about Go is it was always held up as an example of something computers would not be able to do because it requires intuition: you need to be able to look at a board and decide that this is a good place to play just because it feels right. That is one big thing about neural networks that differentiate them from previous generations of logic based AI, which is deliberate, conscious reasoning. In fact, what we really need to do now is figure out good ways of using the neural nets we have sequentially over time to do something like conscious reasoning. But the power is coming from the fact that we now have computers that have intuition.
Excellent. A number of people who are major investors in the AI space have cautioned people to the safety concerns related to artificial intelligence and to insure that those risks are mitigated and kept in mind as advances in the field continue. What are your perspectives are on the risk versus the opportunity related to AI?
I have an analogy. Take a backhoe that can dig a hole in the road very fast compared with a person, but it can also knock your head off. Obviously, when you build a backhoe you want to think about whether you designed the controls in such a way that you are unlikely to knock the heads off passersby. I think it is similar when you get technology that can do computation. You have to think hard about the ways in which this could cause accidents, all the ways in which it could be misused. It is obvious that any powerful technology can be misused. Whether it is misused or not depends upon the political system. You can probably imagine politicians who you would not like to get their hands on really powerful AI, so I think it is not a technological issue. Certainly, AI is going to get better and better. Whether that is used for human good or not depends a lot on the political systems we have. Take the automatic teller machine. Sure, when it was introduced it put a lot of bank clerks out of work and they were probably unhappy about that. But I think looking back on it, nobody now would say “We should not have introduced automatic teller machines.” They got rid of boring work and they are very convenient, and hopefully many of the people who got displaced got more interesting jobs. I think this is just the same but more so.
I think it is a fair point that you raise. A lot of people talk about the fact that so much of what is being developed and what is planned has the potential to replace work and, as you point out, that is really the story of technological advances across the ages. This is not something that is unique only to this present time or the projected future, but, rather, something that has always been the case.
I think the real problem is for the last thirty years or so advances that have increased productivity have not made most people better off. Most people have stayed the same or worse off, and it is the one percent that have benefited from it. And that is a political issue. I think people would be far more interested in improving technology and therefore getting higher productivity if the money was spread around more fairly.
Do you feel that government has a role to play in ensuring that that is the case?
Yes, I do. I believe in taxes. I am not thinking about the innovations bit being taxed. I mean, you could try to do that. I am thinking about rich people being taxed. I think the fact that over the last twenty or thirty years rich people have gotten a whole lot richer, and poorer people have stayed just as poor contributes a lot to people not feeling that technological advance is good.
Geoff, as someone who has been in academics for a long time, who got a PhD, who has been a professor for a while, there are a lot of leaders in AI and in technology more generally speaking, who have dropped out of college or skipped college altogether in order to pursue entrepreneurial dreams of one sort or another. You have Peter Thiel, of course, who is providing additional encouragement to do so. Do you have a perspective on the trend, such as it is, for entrepreneurs to skip a university education all together?
Being a professor, you can imagine I do not think that is a good idea. My feeling is that when you get a smart new graduate student in a good research group, a kind of magic happens. You have someone intelligent in a group where people know a lot, but with an open mind, a very flexible mind. I think most of the really good, really radically new ideas come from graduate students who are themselves smart and are in a good environment — an environment where other people understand what they are talking about and give them good advice. At present, universities are the best place for that to happen.
Peter High is President of Metis Strategy, a business and IT advisory firm. His latest book, Implementing World Class IT Strategy, has just been released by Wiley Press/Jossey-Bass. He is also the author of World Class IT: Why Businesses Succeed When IT Triumphs. Peter moderates the Forum on World Class IT podcast series. Follow him on Twitter @Metis Strategy