Our Very Own Grand Challenge
Back in 2005, Sebastian Thrun and the Stanford Racing Team won the DARPA “Grand Challenge” by autonomously completing a 150-mile route through the Mojave Desert with their car “Stanley” in the fastest time. I get goose bumps every time I watch the video from this event, so when Udacity announced in February that they were putting together an autonomous racing team to compete in the Self-Racing Cars event, I was all over it. Sebastian is in a very different role these days as the Founder and President of Udacity, but he has been inspiring students like myself to follow in his footsteps through the recently-released Self-Driving Car Engineer Nanodegree program. Our Self-Racing Cars team just competed in the Self-Racing Cars event this past weekend (April 1–2). It was an incredible experience that we are honored to share.
On February 15, Udacity selected the group of 18 talented engineers (out of hundreds of applicants) to form the Self-Racing Cars team. Our team was composed of individuals with largely varying backgrounds from all over the world, with the commonalities that we were all enrolled in the Udacity Self-Driving Car Nanodegree program, and extremely passionate about autonomous vehicles. The team was given six weeks to develop the software to drive an autonomous vehicle around the track at Thunderhill Raceway for the Self-Racing Cars event. We were partnered for the event with the awesome team at PolySync who provided us with a Kia Soul vehicle outfitted with their Open Source Car Controls kit (OSCC). Our team would not have access to the car until 2 days before the event, so the six-week lead-up was all about getting familiar with the PolySync software and building our own autonomous models/system that would communicate to the OSCC to control the vehicle.
The PolySync vehicle had a single forward-facing Point Grey Black Fly camera and a Swift Navigation GPS. The car was also outfitted with Radar and Lidar, but we did not use these systems for the event. Our team decided early on that we wanted to develop an end-to-end deep learning-type autonomous system, and rely on GPS as little as possible. GPS waypoint followers are fairly commonplace, but to develop a vision-based deep learning approach using the single forward-facing camera as the only input is at the cutting edge of autonomous vehicle development.
So six weeks, 18 people who had never met, no one with any formal experience with autonomous vehicles, and a single camera…. Oh yeah, and we needed a team name! Since we were developing an autonomous system for the Kia Soul, “Soulless” seemed like the perfect fit.
At our first meeting we split into four teams: The first to develop a simulator to test our models, another to get the PolySync software up and running, another to develop a “Safe” modeling approach, and the final to develop a “Fast” approach. These teams were very fluid and required a lot of crossover. Besides our weekly hour-long Google Hangout meetings, all of our communication was through Slack. The time differences between US, Europe, and Australia created some challenges (team members in seven different time zones!), but besides a few quiet hours here and there, our team was operating and communicating on a 24-hour basis through Slack, passing the baton back and forth between team members on different continents.
With our internal teams set, we all got to work. The output from each team is described briefly below:
With the help of Udacity and a member of the open source community (acknowledgements at the end), within the first week we were able to put out a fairly realistic simulator in Unity using satellite images and elevation data that Kairos Aerospace had collected at the 2016 Self Racing Cars event. A video of the Kia Soul vehicle driving autonomously in the simulator is shown below. After the initial release, we continuously added features to the simulator throughout the six weeks. This simulator proved to be an incredible testing ground for our models and gave significant insight into what worked and what didn’t. Since we didn’t have the real vehicle until the days leading up to the event, this was a critical step.
PolySync Development Team
Data Collection — The goal of our PolySync team was several-fold. First, we needed to extract data that PolySync had taken from the 2016 race in order to train our deep learning models. This process also set the stage for the data processing pipeline that we created for quickly processing data at the race. Getting the PolySync software up and running took several weeks of attempts, but the team at PolySync was extremely helpful. We extracted around 30 minutes of data from the 2016 event. and a snippet of that data can be seen in the video below.
Once we had the 2016 data collected, we worked to create a data pipeline that would allow us to quickly create .CSVs including all the necessary training data and corresponding image directories. This proved to be one of the most challenging tasks that we faced. With around 10 different team members working on end-to-end models, the format of the training data had to fit with everyone’s approaches. We also found out that much of the data that we collected from 2016, and also at the Thunderhill track, did not have throttle information. Our team quickly worked to create a modified “acceleration” variable that would mimic the vehicle throttling. After working with PolySync, we fixed the throttle issue and our final datasets all contained throttle information.
A pleasant surprise during the days at the race was the accuracy of the GPS data that we collected. One of the team members created visualizations of the training runs, and when you zoom in you can actually separate each lap around the track. An image of the GPS overlaid on the track is shown below.
System Architecture — The second goal was to develop an interface between our models and PolySync. Images and other information from the vehicle would be passed through PolySync to our models and then our models could send back steering, throttle, and brake commands through PolySync to the OSCC. The PolySync software is written in C++, but our team was able to create a Python wrapper that allowed communication from our Python-developed models. This created the most flexibility for the modeling team that was developing deep learning models using a combination of Keras and Tensorflow. Our system architecture for communicating with PolySync is shown below:
Safe and Fast Modeling Teams
While these originally started out as separate teams, they quickly merged as we started to grasp the difficulty of getting our models to control all three key vehicle commands using only image inputs. While we tried many things, our developed models could mostly be categorized into three different bins as described below. The general approach was to first train these models on simulator data to the point where they could drive successfully around the simulator track. We then trained, validated, and tested the same models with real-world data from the vehicle, or used combinations of simulator and real-world data for the final models.
Convolutional Neural Networks (CNNs) can be used to autonomously drive vehicles by training the model with image inputs and corresponding steering, throttle, and brake targets. NVIDIA recently released a paper that demonstrated a successful implementation of this methodology for steering a vehicle. Udacity has also made this one of their projects for the Self-Driving Car Nanodegree program, and so many of us on the team had recently implemented this approach to drive a vehicle in Udacity’s Unity simulator. For those interested in the details, I wrote another post about the CNN implementation used for the Udacity project here.
The general idea is that the CNN architecture is able to determine patterns within the image pixels (lane lines, road/offroad) that correlate to certain vehicle controls. One of the key tricks with these models is that you can’t just train the models with the center camera images. If the model only sees training data driving down the center of the road, as soon you deviate slightly from this trained path, the model will see images that are different than those that it was trained on and will steer off-course. Our team used on the order of ~10 different types of image augmentations that modify the training images (and corresponding targets) so that our models are trained to recover from these types of situations. Another key trick is that the Thunderhill track data is biased to left turns as it makes a counter-clockwise loop. Flipping the images and corresponding steering targets is a requirement to reduce this bias. The CNN’s can also take in other inputs such as GPS data or vehicle speed, and we experimented with many different combinations of inputs and outputs. In general our CNN layer architectures were similar to the NVIDIA approach, as they proved to drive well in the simulator and have acceptable computational latency when running on the real vehicle. Descriptions and the code for many of the methods that we implemented can be found here.
The approach for Recurrent Neural Network’s (RNN’s) is similar to that of CNN’s, but instead of taking in single images/targets (single moment in time) for training, the model is trained on sequences of images and targets. For vision, instead of using a 2D convolution over the input images, the model now uses a 3D convolution with an extra depth dimension in time. The output from this 3D convolution is then passed to multiple stateful RNN layers. For our approach we used stateful Long Short-Term Memory (LSTM) and vanilla RNN upper layers. We based our approach on the methods developed by Ilia Edrenkin, a software engineer from Germany, who won the $10k 1st place prize in Udacity’s open source deep learning competition for vehicle steering. Any image augmentations must be implemented differently for the RNN models since it relies on seeing sequences of images.
After trying the CNN and RNN methods described above in the Kia Soul, we were making the occasional turn on the track, but were struggling to get something more promising. We were able to bounce our approaches off of George Hotz from comma.ai, and one idea that he mentioned during the conversation was a PoseNet-style approach. One team member was able to quickly implement a modified version of the PoseNet concept and all of a sudden we were making it around several turns on the track at ~25mph.
A PoseNet uses CNN’s for 6-DOF re-localization. In our approach the six dimensional GPS was replaced by a one dimensional value counting upwards from -1 to 1 from the start to the finish line. A detailed description of our methodology, including the code, can be found here. The architecture is shown below.
A possible improvement would be to not only predict the steering angle, but allow the network to do path planning.
Days Leading Up to the Competition
With our software development mostly complete, we were feeling somewhat confident leading up to the competition that we could get one of the models working. One of the coolest parts of the experience was that every member of the team traveled to the race. By Thursday of race week, most of the team had converged on the Baymont Inn in Willows, CA. This was the first time that many of us were meeting in person, but after working together so closely over the past six weeks it was like meeting up with old friends. This was also the first time that we got to meet the car! The PolySync team had driven in from Portland, Oregon late Wednesday night which allowed us to have Thursday and Friday at the track to test our approaches.
On Thursday, we expected to hit the ground running collecting data and testing models. Unfortunately we had to deal with integration issues for a large portion of the morning as you would expect in any project getting hardware for the first time. We finally verified everything was communicating and were able to collect a significant amount of data from the track for training our models. With the data collected we set to work Thursday night to try to get our models trained and ready for Friday. As mentioned in the data section, we quickly discovered that we were missing throttle input from our data. The PolySync team was able to correct the issue, but could not repair the data we already had. The drive and dedication of this team was incredible! Many of the team members were suffering from various stages of jet lag but still stayed up until 3 am Thursday night to get ready for Friday. A small group used the car in an empty parking lot late into the night to make sure that all the signals were getting transmitted and received properly.
Friday came fast as we were all at the track and ready to start at 7am. There was a ton of excitement the first time that we got to test the models in the vehicle. Pressing a button on your computer and watching software that you developed take over the steering, throttle, and brake is a pretty cool feeling!
Unfortunately, with autonomous vehicles you get both the issues that plague engineers and car enthusiasts! Instead of the vehicle cruising around the track, we quickly realized that something was wrong with our steering as the models would set the steering wheel one way and not adjust back to the center of the track. At the same time we also ran into an issue with the braking system on the vehicle that needed to be troubleshot. Both of these issues lingered throughout Friday as our team worked feverishly to figure out our bugs while the PolySync team tirelessly worked to fix the Kia Soul brake module and replace brake pads.
I have to mention here that the PolySync team was incredible. As hard as we were working to get the software right, they were working into the late hours of the night to make sure that the car was always ready to go.
Finally, using a systematic testing approach for our models with the PolysSync interface and software in the loop, we figured out late Friday night that the way that we were reading in the raw image bytes from the camera was creating a huge latency (~0.5s per image). Our team was able to quickly resolve this issue and we were all of a sudden sending vehicle commands at the necessary 20–30 Hz. We had also collected significantly more data on Friday (with accurate throttle this time!) and had developed a processing pipeline so that all team members were training models using the same, verified data. We were ready, maybe …
Two of the team members headed to the track extra early to see if we could get any testing done before the event started. Unfortunately with a lack of light and other early birds, those plans didn’t work out. Whereas we’d mostly had the track to ourselves on Thursday and Friday, Saturday was packed with autonomous vehicle startups and advanced drivers ready to race on the track. Our first attempts on the track proved that everything was configured properly, however our models were still struggling. Since we were limited to about four 15-minute sessions on the track during the entire day, we had to find a new way to try and verify models. We recorded data on the winding road coming into the track area and started testing and training models on that. Given we gathered a very small amount of data, we were surprised that we had models that could actually make some of the turn on it.
The day turned into a long, adrenaline-fueled push to get data recorded, models trained, and track performance tested all within a small window of time. Finding the right balance of coffee, Redbull, Monster, and pizza was critical throughout the day. One additional perk was that one of the members of the team was a race car driving instructor, so anytime we needed a break he would have us jump in his track-prepared Mazda Miata track car and take us around the track at near race pace of 90mph+! While we had more success with the CNN-based models that looked at more than just the image, at the end of the day no single model was sticking out as a leader. We went back to the hotel business center to reconvene and draw up a plan of attack for Sunday.
There we were. Another early morning and one final chance to make it around the track autonomously. The caffeine and models were loaded up for our first lap around. The whole team was watching and hoping as we took off. As we went around the track we were testing model after model with no improvement from the previous day. Just as we ran out of models to test, another one had finished training while we were still on the track. We activated it and waited … It cruised around a couple turns completely autonomously before it required a takeover! It wasn’t perfect, but it was a huge amount of progress that sent a new surge of motivation through the team! We continued running the model until our time on the track was up, and saw it tackle many of the tracks turns without issue. We kept the same approach to training and testing that we had the previous day and kept trying to do better with our models. By the end of the day we were able to make it around every turn on the track (my favorite being the 180-degree hairpin) but were unable to complete a fully autonomous lap continuously.
While we didn’t hit our ultimate goal of achieving a fully autonomous lap, we all had a complete blast and collectively feel that if we had another couple days of testing with the vehicle we’d have a successful approach. There were a few teams that were able to complete an autonomous lap at the track but they mainly relied on GPS waypoint following.
I can safely say that I’ve never been on a team that worked as hard and as well together as we did for the four days that we had together.
Given our limited time and inherent constraints on the project, we are extremely happy with what we were able to do! We would have loved to get the fastest time on a fully autonomous lap, but doing everything we did gave us a huge sense of accomplishment! We had so much fun diving into a real self-driving car problem and working with each other. We all feel like we’re on the brink of developing a working system and can’t wait to continue. We’ll be open sourcing much of the data we collected, models we created, and simulator we modified. An experience like this can’t be just once in a lifetime, it was too exciting and we are all much closer friends now. This won’t be the last you hear of Team Soulless so make sure to keep in touch and stay tuned!
Team Leader: Anthony Navarro
Team Members: Vlad Burca, Dean Liu, Chris Gundling, Maruf Maniruzzaman, John Chen, Chandra Sureshkumar, Nadia Yudina, Kiarie Ndegwa, Christy Cui, Harinando Andrianarimanana, Jacob Thalman, Jendrik Jordening, Karol Majek, Rana Khalil, Claudio Salvatore De Mutiis, Jerrick Hoang, Nahid Alam
PolySync Team: Josh Hartung, Lyle Johnson, Lucas Buckland, Daniel Fernandez, David Sosnow
We’d like to say thank you to Udacity and specifically Oliver Cameron and Lisbeth Ortega for putting this team together and supporting us along the way. We also received help with the simulator from Aaron Brown and Tawn Kramer, thanks guys!
Finally, we want to say a huge thank you to the event organizer Joshua Schachter! Joshua was great to our team at the event and we look forward to returning to this incredible event next year.