Intelligence Inside

by Steve Jurvetson, Partner

Congratulations to Intel on its acquisition of Nervana. We are now free to share some of our perspectives on the company and its mission to accelerate the future with custom chips for deep learning.

I’ll share a recap of the Nervana story, from an investor’s perspective, and try to explain why machine learning is of fundamental importance to every business over time. In short, I think the application of iterative algorithms (e.g., machine learning, directed evolution, generative design) to build complex systems is the most powerful advance in engineering since the Scientific Method. Machine learning allows us to build software solutions that exceed human understanding, and shows us how AI can innervate every industry.

By crude analogy, Nervana is recapitulating the evolutionary history of the human brain within computing — moving from the logical constructs of the reptilian brain to the cortical constructs of the human brain, with massive arrays of distributed memory and iterative learning algorithms.

From right to left, Naveen Rao, Amir Khosrowshahi and Arjun Bansal — the Nervana founders — pondered where on the wall they may fall during M&A negotiations (from the last board meeting at DFJ)

Not surprisingly, the founders integrated experiences in neuroscience, distributed computing, and networking — a delightful mélange for tackling cognitive computing. My partner Emily Melton discovered the company through Ali Partovi, an advisor to Nervana.

We were impressed with the founding team and we had a prepared mind to share their enthusiasm for the future of deep learning.

Part of that prepared mind dates back to 1989, when I started a PhD in EE focusing on how to accelerate neural networks by mapping them to parallel processing computers. Fast forward 25 years, and the nomenclature has shifted to machine learning and the deep learning subset, and I chose it as the top tech trend of 2013 at the Churchill Club VC debate (video). We were also seeing the powerful application of deep learning and directed evolution across our portfolio, from molecular design to image recognition to cancer research to autonomous driving.

All of these companies were deploying these simulated neural networks on traditional compute clusters. Some were realizing huge advantages by porting their code to GPUs; these specialized processors originally designed for rapid rendering of computer graphics have many more computational cores than a traditional CPU, a baby step toward a cortical architecture. I first saw them being used for cortical simulations in 2007. But by the time of Nervana’s founding in 2014, some (e.g., Microsoft’s and Google’s search teams) were exploring FPGA chips for their even finer-grained arrays of customizable logic blocks. Custom silicon that could scale beyond any of these approaches seemed like the natural next step. Here is a page from Nervana’s original business plan:

The march to specialized silicon, from CPU to GPU to FPGA to ASIC, had played out similarly for Bitcoin miners, with each step toward specialized silicon obsoleting the predecessors. When we spoke to Amazon, Google, Baidu, and Microsoft in our due diligence, we found a much broader application of deep learning within these companies than we could have imagined prior, from product positioning to supply chain management.

Machine learning is central to almost everything that Google does.

And through that lens, their acquisition, and new product strategies make sense; they are not traditional product line extensions, but a process expansion of machine leaning (more on that later). They are not just playing games of Go for the fun of it. Recently, Google switched their core search algorithms to deep learning, and they used Deep Mind to cut data center cooling costs by a whopping 40%.

The advances in deep learning are domain independent. Google can hire and acquire talent and delight in their passionate pursuit of game playing or robotics. These efforts help Google build a better brain. The brain can learn many things. It is like a newborn human; it has the capacity to learn any of the languages of the world, but based on training exposure, it will only learn a few. Similarly, a synthetic neural network can learn many things.

Google can let the Brain team find cats on the Internet and play a great game of Go. The process advances they make in building a better brain (or in this case, a better learning machine) can then be turned to ad matching, a task that does not inspire the best and the brightest to come work for Google.

The domain independence of deep learning has profound implications on labor markets and business strategy. The locus of learning shifts from end products to the process of their creation. Artifact engineering becomes more like parenting than programming. But more on that later; back to the Nervana story.

Our investment thesis for the Series A revolved around some universal tenets: a great group of people pursuing a product vision unlike anything we had seen before.

The semiconductor sector was not crowded with investor interest. AI was not yet on many venture firms’ sectors of interest. We also shared with the team that we could envision secondary benefits from discovering the customers. Learning about the cutting edge of deep learning applications and the startups exploring the frontiers of the unknown held a certain appeal for me. And sure enough, there were patterns in customer interest, from an early flurry in medical imaging of all kinds to a recent explosion of interest in the automotive sector after Tesla’s Autopilot feature went live. The auto industry collectively rushed to catch up.

Soon after we led the Series A on August 8, 2014, I found myself moderating a deep learning panel at Stanford with Nervana CEO Naveen Rao:

I opened with an introduction to deep learning and why it has exploded in the past four years (video primer). I ended with some common patterns in the power and inscrutability of artifacts built with iterative algorithms. We see this in biology, cellular automata, genetic programming, machine learning and neural networks.

There is no mathematical shortcut for the decomposition of a neural network or genetic program, no way to “reverse evolve” with the ease that we can reverse engineer the artifacts of purposeful design.

The beauty of compounding iterative algorithms — evolution, fractals, organic growth, art — derives from their irreducibility. (More from my Google Tech Talk and MIT Tech Review)

Year 1. 2015

Nervana adds remarkable engineering talent, a key strategy of the first mover. One of the engineers figures out how to rework the undocumented firmware of NVIDIA GPUs so that they run deep learning algorithms faster than off-the-shelf GPUs or anything else Facebook could find. Matt Ocko preempted the second venture round of the company, and he brought the collective learning of the Data Collective to the board.

Year 2. 2016 Happy 2nd Birthday Nervana!

The company is heads down on chip development. They share some technical details (flexpoint arithmetic optimized for matrix multiplies and 32GB of stacked 3D memory on chip) that gives them 55 trillion operations per second on their forthcoming chip, and multiple high-speed interconnects (as typically seen in the networking industry) for ganging a matrix of chips together into unprecedented compute fabrics. 10x made manifest.

Board meetings got especially playful in the Playground =)

And then Intel came knocking.

With the most advanced production fab in the world and a healthy desire to regain the mantle of leading the future of Moore’s Law, the combination was hard to resist. Intel vice president Jason Waxman told Recode that the shift to artificial intelligence could dwarf the move to cloud computing. “I firmly believe this is not only the next wave but something that will dwarf the last wave.” But we had to put on our wizard hats to negotiate with giants.

The deep learning and AI sector have heated up in labor markets to relatively unprecedented levels. Large companies are recently paying $6–10 million per engineer for talent acquisitions, and $4–5M per head for pre-product startups still in academia. For the Masters students in a certain Stanford lab, they averaged $500K/yr for their first job offer at graduation. We witnessed an academic turn down a million dollar signing bonus because they got a better offer.

Why so hot?

The deep learning techniques, while relatively easy to learn, are quite foreign to traditional engineering modalities. It takes a different mindset and a relaxation of the presumption of control. The practitioners are like magi, sequestered from the rest of a typical engineering process. The artifacts of their creation are isolated blocks of functionality defined by their interfaces. They are like blocks of magic handed to other parts of a traditional organization. (This carries over to the customers too; just about any product that you experience in the next five years that seems like magic will almost certainly be built by these algorithms).

And remember that these “brain builders” could join any industry. They can ply their trade in any domain. When we were building the deep learning team at Human Longevity Inc. (HLI), we hired the engineering lead from the Google’s Translate team. Franz Och pioneered Google’s better-than-human translation service not by studying linguistics, grammar, or even speaking the languages being translated. He focused on building the brain that could learn the job from countless documents already translated by humans (UN transcripts in particular). When he came to HLI, he cared about the mission, but knew nothing about cancer and the genome. The learning machines can find the complex patterns across the genome. In short, the deep learning expertise is fungible, and there are a burgeoning number of companies hiring and competing across industry lines.

And it is an ever-widening set of industries undergoing transformation, from automotive to agriculture, healthcare to financial services. We saw this explosion in the Nervana customer pipeline. And we see it across the DFJ portfolio, especially in our newer investments. Here are some examples:

  • Learning chemistry and drug discovery: Here is a visualization of the search space of candidates for a treatment for Ebola; it generated the lead molecule for animal trials. Atomwise summarizes: “When we examine different neurons on the network we see something new: AtomNet has learned to recognize essential chemical groups like hydrogen bonding, aromaticity, and single-bonded carbons. Critically, no human ever taught AtomNet the building blocks of organic chemistry. AtomNet discovered them itself by studying vast quantities of target and ligand data. The patterns it independently observed are so foundational that medicinal chemists often think about them, and they are studied in academic courses. Put simply, AtomNet is teaching itself college chemistry.”
  • Designing new microbial life for better materials: Zymergen uses machine learning to predict the combination of genetic modifications that will optimize product yield for their customers. They are amassing one of the largest data sets about microbial design and performance, which enables them to train machine learning algorithms that make search predictions with increasing precision. Genomatica had great success in pathway optimization using directed evolution, a physical variant of an iterative optimization algorithm.
  • Discovery and change detection in satellite imagery: Planet and Mapbox. Planet is now producing so much imagery that humans can’t actually look at each picture it takes. Soon, they will image every meter of the Earth every day. From a few training examples, a convolutional neural net can find similar examples globally — like all new housing starts, all depleted reservoirs, all current deforestation, or car counts for all retail parking lots.
  • Automated driving & robotics: Tesla, Zoox, SpaceX, Rethink Robotics
  • Visual classification: JustVisual trained a deep learning network to perform product visual similarity searches in e-commerce settings. They now apply that capability across a number of applications; for example, they can enable autonomous robots to classify visual inputs in real time as they go about their work.
  • Cybersecurity: When protecting endpoint computing & IOT devices from the most advanced cyberthreats, AI-powered Cylance is proving to be a far superior and adaptive approach versus older signature-based antivirus solutions.
  • Financial risk assessment: Avant and Prosper use machine learning to improve credit verification and merge traditional and non-traditional data sources during the underwriting process.
  • And now for something completely different: quantum computing. For a wormhole peek into the near future, our quantum computing company, D-Wave Systems, powered a 100,000,000x speedup in a demonstration benchmark for Google, a company that has used D-Wave quantum computers for over a decade now on machine learning applications.

So where will this take us?

Neural networks had their early success in speech recognition in the 90’s. In 2012, the deep learning variant dominated the ImageNet competitions, and visual processing can now be better done by machine than human in many domains (like pathology, radiology and other medical image classification tasks). DARPA has research programs to do better than a dog’s nose in olfaction. We are starting the development of our artificial brains in the sensory cortex, much like an infant coming into the world. Even within these systems, like vision, the deep learning network starts with similar low level constructs (like edge-detection) as foundations for higher level constructs like facial forms, and ultimately, finding cats on the internet with self-taught learning.

But the artificial brains need not limit themselves to the human senses. With the internet of things, we are creating a sensory nervous system on the planet, with countless sensors and data collecting proliferating across the planet. All of this “big data” would be a big headache but for machine learning to find patterns in it all and make it actionable. So, not only are we transcending human intelligence with multitudes of dedicated intelligences, we are transcending our sensory perception.

And it need not stop there. It is precisely by these iterative algorithms that human intelligence arose from primitive antecedents. While biological evolution was slow, it provides an existence proof of the process, now vastly accelerated in the artificial domain. It shifts the debate from the realm of the possible to the likely timeline ahead.

Let me end with the closing chapter in Danny Hillis’ CS book The Pattern on the Stone: “We will not engineer an artificial intelligence; rather we will set up the right conditions under which an intelligence can emerge. The greatest achievement of our technology may well be creation of tools that allow us to go beyond engineering — that allow us to create more than we can understand.”

Steve Jurvetson is a partner at DFJ
@dfjSteve