Filling in the gaps: How Mapillary uses PyTorch to improve maps everywhere and for everyone

Published in

PyTorch

4 min readSep 19, 2019

If you live in New York, San Francisco, or any other developed city in the world, you probably use mobile or web maps to find directions to get to your destinations. Most of the time, you get accurate and detailed maps on your mobile phones. But in many places around the world, maps are either inaccurate, lacking in detail, or non-existent. Due to a lack of resources on the ground, large parts of the world are left behind by the revolution in digitized geography.

The solution, computer vision expert Jan Erik Solem and his three co-founders thought, was a mapping platform that lets anyone take photos of streets and then have that imagery automatically processed and analyzed with computer vision to stitch them together and generate map data automatically — a do-it-yourself street-view powered by artificial intelligence if you will.

Within a few months of launching Mapillary in early 2014, it turned out that accessing street-level images was just one of many mapping demands. Organizations like cities, transportation agencies, utility providers, car companies, and many others turned to Mapillary to get data in the imagery to solve things like traffic sign and utility pole inventories, navigation, and compliance.

“There was a huge market opportunity here. Powering computers to analyze street-level images automatically and accurately, we could meet various mapping demands at a low cost. The key piece for us was to figure out the technology. Luckily, we quickly got a huge dataset of street-level images from all over the world to work with,” Solem explains.

In 2016, the computer vision team at Mapillary started building models developed in PyTorch to detect everything in images, ranging from traffic signs and road markings to potholes and utility poles. This was the first time Mapillary used a deep learning network in an extensive way. Mapillary chose PyTorch because of its flexibility during the research phase. PyTorch is easily extensible and flexible, while being fast enough to train on huge datasets of imagery in a reasonable amount of time.

Meanwhile, using dynamic computational graphs made working with object detection and panoptic segmentation networks much easier. These networks can be implemented using other deep learning frameworks that don’t support dynamic computation, but it’s much more straightforward to do in PyTorch, and often more computationally efficient.

As a result, the models that Mapillary built in PyTorch can detect and tag essentially everything in an image. This brings scalability in mapping to a whole new level as all players can get the different data they need from the same imagery.

*Mapillary’s computer vision models detect all objects in street-level images before positioning and placing them on the map*

Soon after that, customers started requesting that the individual detections in images would be geo-positioned. In other words, every single item detected in an image, whether road marking or utility pole, fire hydrant or traffic sign, would be positioned and placed on the map.

“Positioning the objects on the map and not just finding them in the images required a different method altogether. This time we needed the algorithms to detect the same object from different images to estimate where the objects are located, before placing them on the map as stand-alone map features. To do this, we use the 3D models that we build to connect the images across many different contributors, in order to provide us with the localization information we needed,” Solem says. “The 3D models were improved by using the segmentation results, which were provided by our PyTorch-developed models. This allows us to eliminate disturbing areas like sky, trees, and moving objects.”

*The Mapillary app captures images automatically. The images and all the objects in them are then geo-positioned and placed on the global map, and made available to everyone*

Mapillary currently hosts 680 million images, and two million images are added every day. Using PyTorch, millions of objects are detected and added on the map every single day through various computer vision processing steps.

Today, data is needed by cities, mapping companies, and self-driving cars, but Solem thinks we will see many more use cases for map data in the future. “All new cars will be equipped with cameras, and we will see the rise of delivery robots, fleets of self-driving cars and trucks, and entirely new use cases of map data that haven’t even been invented yet,’’ Solem says. “Maps will be everywhere,” he concludes, “whether humans see them or not.”

Filling in the gaps: How Mapillary uses PyTorch to improve maps everywhere and for everyone

Written by PyTorch