Our focus areas: Computer vision

4 min readOct 29, 2015

(Post 4 of 6 introducing our focus areas at KPCB Edge)

Today, computers can analyze and understand image data better and faster than ever before. Three recent developments have made building vision systems significantly more feasible: (1) improvements in general-purpose GPU (GPGPU) hardware and hardware accessibility, (2) improvements in tooling around GPU acceleration of network training, and (3) discoveries in applying convolutional neural networks (CNNs) to image recognition problems. Computer vision is one of our core focus areas at KPCB Edge, and we’re excited about it for these reasons and more.

Improvements in GPGPU performance and accessibility

The first important trend worth noting is the convergence upon GPU use for deep learning applications. In The ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a yearly competition where entrants aim to produce the most accurate image classifier for the ImageNet data set, GPU use led to a significant drop in classification error. The first GPU-driven solution was introduced in 2012, and by 2014, 90% of teams used GPUs to train their models:

Source: http://devblogs.nvidia.com/parallelforall/nvidia-ibm-cloud-support-imagenet-large-scale-visual-recognition-challenge/

Likely driving that shift to some extent, GPGPU performance has improved dramatically over the last few years, significantly outpacing CPU performance improvements for highly parallel tasks. Looking specifically at NVIDIA GPUs, since most deep learning software relies on NVIDIA’s CUDA (Compute Unified Device Architecture) toolkit, per-GPU performance over time has improved exponentially:

Source: http://images.anandtech.com/doci/8729/TK80Perf.jpg

More importantly from a cost standpoint, performance per watt has also improved at an exponential rate. This is projected to continue until at least 2018:

Source: http://www.extremetech.com/wp-content/uploads/2015/03/Pascal1.png

High-end GPUs can be rented on an hourly basis using the AWS g2.2xlarge and g2.8xlarge instance types, making it easy to get started without any upfront investment in hardware.

New software tools easing GPGPU use for deep learning

OpenCV, Caffe, and other tools have also made it easier than ever before for the average developer to build computer vision applications. In a little over an hour, with no formal training or background in computer vision (or devops), I was able to get Caffe up and running on a local Ubuntu VM and run a model trained on the full ImageNet data set. I was also able to train a simple network to do handwriting recognition. For the curious, I documented the steps I took on a fresh ubuntu/trusty64 Vagrant VM to set this up.

(These steps could be repeated on an AWS g2 instance to get access to GPU acceleration. This was just a proof of concept to show how simple it is to get going. Using a pre-built AMI maintained by other folks would probably be better than taking these steps yourself.)

Developers still need to understand the underlying algorithms to get good results, but it’s no longer necessary to implement your own CNN tooling in CUDA or OpenCL to get acceptable performance. Most companies we’ve seen are taking full advantage of these tools, allowing them to spend more time tweaking their network parameters and managing their data sets.

Improvements in image recognition accuracy

Last, but certainly not least, image recognition accuracy in academic work has continued to improve at a dramatic pace. The ILSVRC has seen a significant downward trend in recognition error since its inception (see the first chart above).

The most recent competition’s winner surpassed the performance of untrained humans, with the competition results saying that “a significant amount of training time is necessary for a human to achieve competitive performance on ILSVRC” with the winning model.

What this means

We believe these three trends will lead to a significant increase in the number of applications built for computer vision. However, because computer vision tooling is now widely accessible, there are fewer barriers to entry for developers building computer vision products. In other words, it’s more difficult for a company to build a defensible business purely based on technology.

Because of this decrease in technical defensibility, the most interesting vision companies we’ve seen have a compelling data acquisition model within the vertical they are targeting. This usually involves either access to training data that would be difficult for others to acquire, or finding a clever way to structure a manual training process more tenable for their customers (making data acquisition less important). We’re most interested in seeing companies apply computer vision to specific verticals or problems, especially where humans are currently used to complete tasks (like our portfolio company Mashgin) or perhaps where the size of a task is so large that humans could never scalably do it manually (ex. OCR).

If you’re a founder at a computer vision company and this resonates with you, we’d love to chat!

Our focus areas: Computer vision

Improvements in GPGPU performance and accessibility

New software tools easing GPGPU use for deep learning

Improvements in image recognition accuracy

What this means

Written by KPCB Edge