Alexandre Winter, director of computer vision, Arlo/Netgear. Images: Caroline Sinno Photography LLC

Vision AI and ‘sense-able’ autonomy

What’s the biggest hurdle faced by vision artificial intelligence? And what do advances in computer vision mean for self-driving cars and surveillance? Luc Vincent from Lyft and Alexandre Winter from Arlo/Netgear lay out the landscape of vision AI in the next 12–24 months

Published in
5 min readDec 20, 2018

--

In the future of artificial intelligence for commercializable applications, computer vision plays a key role. Using machines to see like the human eye is crucial to the success of innovations from autonomous vehicles to next-gen home security systems. In part 2 of our series on the Future Labs stage at The AI Summit, New York, on December 5, we explore how computer vision is advancing to propel new technologies.

The challenges with computer vision

Future Labs managing director Steven Kuyan pointed out that AI researchers in computer vision typically have to “start from scratch” when creating vision-driven applications. For each use case, a unique AI must be trained in isolation.

Data and models that train a car to navigate city streets, for example, don’t help a security system differentiate a mailman from a burglar.

The staggering amount of data it takes to effectively train AI means researchers in computer vision have a limited amount of transfer learning to work with. And that’s still far from the only challenge.

Unlike humans, machines have highly limited worlds — and worldviews — restricted directly to their use cases. Computers don’t naturally have vision, so how can you train them to see more?

“We have yet to understand vision well enough ourselves to be able to teach computers what vision is,” said Steven. “We need all these different data sets because we can’t train with vision. We can feed machines a tremendous amount of images that will make them more efficient, but we don’t know how to train vision.”

These inherent hurdles make the incredible progress of computer vision AI to date all the more impressive.

Instrumentalizing environments to make sense of them

Smart home security cameras sold by Arlo, a newly public company spun out from Netgear, do more than signal alarms or alert police when motion sensors are alerted. The company uses computer vision to understand what its cameras detect — enabling its technology to differentiate a stray animal, for example, from a human intruder.

In practice, this smart detection functionality helps minimize false alarms (and open up customizable security options for customers). But according to Alexandre Winter, director of Computer Vision for Arlo, the value of the company’s vision AI goes much further than that.

Alexandre Winter, director of computer vision, Arlo/Netgear

Equipping machines with the ability to see people, places, and objects — and to recognize locations, interactions, and movements — will give humans vastly greater insight into the physical world. It will also make it more akin to the digital realm, where companies enjoy unprecedented volumes of information that can be used for customer insights and predictive intelligence.

“If we understand attention and interactions, we can make the physical world safer and more efficient,” says Alexandre. “We can A/B test different strategies based on long-term information. We predict how likely things are to happen. We can optimize efforts for cost, speed, or safety.”

Unlocking those insights starts with the right infrastructure. Alexandre spoke to how understanding our environments requires first making them ‘sense-able’ by machines. This concept, referred to as instrumentalizing the physical world, is a continually expanding phenomenon: According to estimates he shared, smart home and security cameras will make up 26% of the world’s 44 billion camera sensors by the year 2022.

A more sense-able physical world (and better edge intelligence) will then lead to further advances in AI’s sensing capabilities. Today’s systems are fairly effective at object recognition and re-identification, for example, but facial recognition still requires well-positioned cameras.

Instrumentalizing the environment will help Arlo and others achieve AI that can not only recognize a face from any camera, but also pick up on a person’s walking gait or palm veins — which are poised to become important markers for biometric identification.

Giving cars sight for a self-driving future

Lyft is one of the companies leading collaboration in computer vision AI for transportation. Luc Vincent (Lyft’s VP of Autonomous Technology) heads the company’s self-driving efforts, overseeing a team of 300+ engineers located across Munich, London, and Palo Alto.

Luc Vincent, VP of autonomous technology, Lyft

Luc shared how Lyft has taken a two-pronged approach to its autonomy initiatives:
1. Its Open Platform allows third-party providers of self-driving cars to “plug” their vehicles into Lyft’s ride-sharing app for dispatch (where possible) based on location, demand, and availability. Participating partners include Waymo, General Motors, and Jaguar Land Rover.

2. Lyft’s pursuit of autonomy for its own vehicle fleet is spearheaded out of its Level 5 Engineering Center in Palo Alto. Luc gave us a peek at how Lyft’s Level 5 team designed their ‘autonomy stack’ to take sensor data (collected from cameras, radar, LIDAR systems, and so on) and flows it into 4 further segments:

(i) A perception layer capable of detecting objects, computing speeds/velocities, and localizing geo data

(ii) A fusion component that recognizes object movement dynamics and applies semantics to the data

(iii) A planning configuration for movement prediction, behavioral planning, and trajectory generation

(iv) A controls function for pushing the generated trajectory out with appropriate speed and steering

Lyft sees this two-pronged model as one of its key advantages against competitors in the race to a self-driving future. The approach capitalizes on the benefits of both proprietary Lyft technology and the third-party manufacturing capabilities of its partner network, accelerating innovation on all sides.

Another advantage is Lyft’s breadth and scale. The company currently executes 40M+ driver-operated rides worldwide every week, equipping itself with a nearly unprecedented volume of data to leverage for machine learning, mapping, and scenario collection.

“Having an existing [ride-sharing] service also gives us focus,” Luc said. “It helps us understand which routes matter and where to focus our efforts to deliver value. We don’t have to try to do ‘autonomous for everywhere.’ We focus on specific routes to get started.”

The routes by Lyft’s Palo Alto offices are known enough for the company to be piloting rides to work for employees via self-driving cars. As the research efforts continue, Lyft’s outlook posits that within the next 10 years, most of our driving scenarios will shift to human+AI operation. After that comes the bold new future where cities are finally designed around people, not cars.

Read more about what to expect in AI over the next 12–24 months in our 6-part series on the Future Labs stage at The AI Summit, New York:

Part 1: 2019 outlook: AI research to real-world application
Part 3:
Voice AI for business opportunity
Part 4:
Robotics AI is augmenting human intelligence

--

--

The Future Labs at NYU Tandon offer the businesses of tomorrow a network of innovation spaces and programs that support early stage startups in New York City.