Challenge #3: Image-Based Localization

Udacity Self-Driving Car Challenge #3

The Udacity Self-Driving Car

As detailed in this post, a critical part of our process in launching the Self-Driving Car Nanodegree program is to build our own self-driving vehicle. To achieve this, we formed a core Self-Driving Car Team with Google Self-Driving Car founder and Udacity President Sebastian Thrun. One of the first decisions we made together? Open source code, written by hundreds of students from across the globe!

We are now breaking down the problem of making the car autonomous into Udacity Challenges. The first challenge is complete, Challenge #2 is underway, and we’re now ready to introduce Challenge #3!

Challenge #3 — Now Open

Udacity is moving full speed ahead with development on our self-driving car. Challenge #3 will deal with one of the most widely studied aspects of robotics engineering: localization. To operate safely, a self-driving vehicle must literally know where it is in the world, and this is not possible simply by relying on GPS, where accuracy can vary wildly depending on conditions. Unlike many other localization solutions, we are not going to rely on LIDAR, but camera imagery.

An introduction to localization from our Artificial Intelligence for Robotics class

Challenge Overview

We are challenging our community to come up with the best image-only solution for localization. No LIDAR, no GPS!

Images hold a lot of information, and can actually contain more landmarks in their field of view than an equivalent LIDAR frame. By processing imagery in real-time and comparing those images to previous drives in the same area, you can actually get a localization solution that is good enough for use in navigation. Think of it this way: When you are walking down a street that you’ve traversed several times before, you know where you are because of how close you are to a certain building, intersection, or bridge. This information is all visual, and we can teach computers how to make the same decisions based off of landmarks that they can interpret.

This challenge will be heavy in image processing and tools like OpenCV. You will need to build a pipeline that can take a frame from our dashboard camera, process it, and compare it to a database of previous drives. This database doesn’t need to be a directory of images, and you’ll actually find that it will be too slow to index regular imagery. If you don’t have any experience with things like fourier transforms or converting images to grayscale, you may want to join a team with expertise in these areas.

Challenge #3 will follow a model very similar to Challenge #2, and you will use the same workflow to retrieve and process data. You can get started with the data that has already been released, with more data coming soon.

We don’t want to say too much more, because we are so excited to see the elegant and creative solutions you come up with — get creative!


To join a team and get involved in the community growing around the Udacity Self-Driving Car, please join the Slack team here, and join #challenge-three.


First Place: All-expenses-paid trip for the team leader and 3 other teammates to Udacity HQ in Mountain View, California to meet and brainstorm with Sebastian Thrun
Second Place: One-time sum of $10,000
Third Place: To be announced!


Start Date: 10/07/2016
End Date: 11/04/2016

Essential Rules

  • You must produce a localization solution (latitude, longitude in the same format as the dataset) using only imagery from the front-facing center camera.
  • You can train using the GPS localization solution recorded in the ROS bags in the datasets.
  • One team per participant, one submission per team, no maximum team size. Teams must be formed by October 11th.
  • Teams must have a designated lead who will submit their team’s entry. The designated lead must check in with our Udacity Slack lead @mac weekly to be eligible.
  • A submission will be considered ineligible if it was developed using code containing or depending on software that is not approved by the Open Source Initiative, or a license that prohibits commercial use.
  • No restrictions on training time, but must process a frame faster than 1/20th of a second, and no using future frames. Solutions may only be generated from past and current data, as the car will not be able to look into the future.
  • Winners must submit runnable code (with documentation and description of resources/dependencies required to run the solution) with reproducible results within (1) week of being selected as the Challenge winner.
  • All code submitted will be open-sourced, and there should be no expectation of maintaining exclusive IP over submitted code.
  • No hand-labelling of test dataset allowed.

For full contest rules, please read this.


Udacity will provide the teams with two datasets, training and testing. The training set will be accompanied by GPS location values for each frame, but the testing/evaluation set will not. The teams will then build a model on the training data, use it to predict on the testing data, and create a file with predicted localization solutions for the test set (again for each frame). Teams will then upload this file with predictions to our servers, and we will calculate the score against the actual GPS location values. Teams will test their code and evaluate locally before their submission by splitting the training set into their own training and validation set.

Evaluation Metric

Root Mean Square Error. From Wikipedia:

The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample and population values) predicted by a model or an estimator and the values actually observed. The RMSD represents the sample standard deviation of the differences between predicted values and observed values.

Frequently Asked Questions

Submission Format & Details

Teams will be able to submit their final results only once on the testing set in CSV format via email to The submission email must be accompanied by a list of teammates, team name, and code/documentation. More information on this format will be released in the coming weeks.

How do I get started?

  1. Join the Slack team and join the #challenge-three channel
  2. Download the datasets here
  3. Install Ubuntu 14.04 in a virtual machine or directly onto your system.
  4. Install ROS to playback data and convert into different formats
  5. Submit your results to with code (preferably in a Git repo) and team information.

Data Format

All sensor data including imagery is provided in the ROSbag format. To playback this data, you will need to install ROS on a Ubuntu Linux platform and test from there. Additionally, you can convert the data into any format you like.

Definition of Real-Time Performance

Video processing latency has not been measured yet on target hardware with GigE camera. Anticipate a GTX 1070, i7–4770TE CPU, and 16GB+ RAM. There is some wiggle room on “real time performance.” Essentially, your network has to process 15+ frames a second. We expect difficulty here with replication until we have an AWS/Azure instance specification for later challenges.

Open Source Tools Clarification

In pre and post processing of your neural networks, you may use proprietary code and tools, as long as your final code/network/solution operates independently of any closed source code, as defined in the above rules.

Here We Go!

Udacity is dedicated to democratizing education, and we couldn’t be more excited to bring this philosophy to such a revolutionary platform — the self-driving car! We anticipate this project to have an incredible impact on the industry, giving anyone access to the tools required to get an autonomous vehicle on the road. Help us achieve this dream by joining a team and competing in our challenges. Plus, if you’re looking to gain the skills necessary to launch a career building cars that drive themselves, we encourage you to check out our Self-Driving Car Engineer Nanodegree program.