How we won our company DeepRacer league and qualified for AWS re:Invent

Glenn Horan
LibertyIT
Published in
11 min readNov 1, 2019

The story of how a team of Software Developers from Belfast fought through a tough field of 64 to win the top prize in our internal company league: A trip to AWS re:Invent in Las Vegas.

Team Outrun- Left to right: Glenn Horan, Darren Broderick, Paul McCartney, David Fyffe and David Kelly

On the 30th of October 2019 our fastest model ran a blistering lap of 7.61 seconds to land us a place at the AWS DeepRacer world final at AWS re:Invent 2019 in Las Vegas. For us, who’d worked so hard on this for the previous 3 months, it was a dream come true. This is the story of our journey along with the lessons we’ve learned along the way and some advice for others on how to succeed at setting good physical lap times in DeepRacer.

The beginning

We work for Liberty Information Technology (LIT) which has the backing of global insurance heavyweight Liberty Mutual. Both companies have a reputation of pushing the boundaries of innovation for their products and encourage their staff to do the same day to day. They are also huge advocates of AWS and are keen to get their hands on and start using any new technologies or services that Amazon add to their already extensive repertoire. As part of a big push in the field of machine learning, Liberty Mutual invited staff to take part in a global company DeepRacer league. The wording of the initial communications suggested that they only expected a small number of entrants, but before long there were around 52 teams entered from the UK, Ireland and around the US, each with five members. This was no doubt helped by the huge prize up for grabs — a trip to Las Vegas for AWS re:Invent 2019 to represent the company on the world stage and a chance to lift the coveted DeepRacer championship cup.

For readers who are unfamiliar with DeepRacer, it’s a gamification of reinforcement machine learning devised by AWS to get people involved with ML. Users can open the DeepRacer console, write a reward function that the car will then use to train itself using a simulation of the track until it can navigate the track with ease. Users must also be familiar with a range of hyperparameters in order to increase the training efficiency of their model. Once these models are performing satisfactorily, they can be uploaded into actual physical 1/16 scale cars and raced around a track. The best time recorded going around the track, wins the prize. For more details, see here: deepracer.

So we got to work. We started watching the incredibly useful video tutorials on the AWS DeepRacer site under “Pit stop”. This taught us about the different hyper parameters, the reward function, the action space and of course the league itself. It’s highly recommended for beginners to check this section out, it will save you a tonne of time (and money if you’re funding your own model training) and give you a solid foundation to start making fast models. After that, DeepRacer TV on youtube is a great place to get a feeling for the hype of live events, as well as getting some tips directly from the winners of different city leagues. The rest we learned with experience, for example the first big breakthrough we had is that you can change the reward function between clones of a model. At the time it seemed like a game changer, but now it’s fairly standard practice.

The AWS Pit Stop. This should be everyone’s first port of call to pick up the basics of DeepRacer

It’s important to point out that the tournament was being hosted by Liberty Mutual, meaning each round was held in one of the offices in the US. We had a “proxy racer” load and race our model for us while we watched via a webcast put up by one of the event organisers. For those not in the know, AWS re:Invent is seen by many as the holy grail of tech conferences. Thousands of people attend every year and it frequently sells out. The tickets alone cost the guts of $2000 and the opportunity to just go, never mind be participants in the DeepRacer world final could be considered once in a life time, especially for a small team of developers from Belfast.

Round of 64 — Keep it simple and stupid

In the round of 64, we used a model that had a very limited action space — only 1 speed and 5 steering angles. The logic behind this was that the car would very quickly learn how to race around the track and any speed variations on the physical track could be done on the day using the throttle on the tablet. We drew inspiration from the videos of DeepRacer TV Tokyo, watching some of the models run sub 8 second super smooth laps with basic center line based models. The key to this strategy working was reliability. We used the data from the AWS DeepRacer console to determine the % of laps being completed in training. We then ran the evaluation on our models several times to see how many of the evaluation laps were being completed. When we were satisfied that the model was completing almost 100% of it’s laps in the evaluations, it was ready for the big day.

Our model worked well and got us a respectable lap time of 10.56 seconds. By the end of the day we were in fourth place and comfortably qualifying for the next stage. What we learned however is that it was clear that virtual times were no indicator of how fast the model can go on the day. 10.56 seconds was ~5 seconds quicker than our simulations would suggest. We also knew we had to make some big improvements to get our model into first place.

Round of 32 and 16: Research and logs

It was nearly a month between the round of 64 and the round of 32, which was fortunate as we had a lot of work to do and there was only one day between the round of 32 and the round of 16. It was here that we made our biggest advancements using two key resources: The DeepRacer Slack community (https://deepracing.io) and an article outlining how to use AWS released Jupyter Notebooks to take a deep dive into the abundance of data that are produced while training a model (https://codelikeamother.uk/using-jupyter-notebook-for-analysing-deepracer-s-logs).

At the time of writing the DeepRacer community on slack has around 1000 members including all the big names from DeepRacer TV and the virtual league. I can’t stress enough how knowledgeable, friendly and helpful that the community is. Upon joining I was greeted by the founder Lyndon Leggate and I’ve been able to chat with pros about different aspects of training and their experience will physical racing as well as troubleshooting the Jupyter logs. The Jupyter notebook uploaded to Github by the folks at AWS generates incredibly useful visualisations from the training log data (see the article listed above for details). Our favourite data are the heat map produced of where the car is getting it’s reward functions, the top rewarded iterations and the simulation run analysis that show the path the car is taking as well as the speed it’s going at different points on the track.

Heat map of where model picks up rewards during training run. Note the inside lane on the hairpin turn

The strategy we used for these two rounds was much more advanced than before thanks to what we’d learned in the slack community. The key piece of information we learned here was that even though you can’t edit the action space of models between clones using the console, it is possible to do it by editing a metadata file stored in an S3 bucket. Why is this significant? Well let’s say we want a model that can do a max speed of 8m/s with a speed granularity of 2. That means the model can either go 4m/s or 8m/s. That’s not particularly great as the model will be jumping between two speeds that are wildly different from each other. We were able to edit our action spaces to bring the speeds closer to each other. A model that could only do 5m/s in the tight corners, 6m/s in the shallow corners, then 8m/s (and only 8m/s) when it’s going straight. Finally the console would only allow us to set the max speed to 8m/s, using this technique we could make it much higher.

Again, This worked well and got us in a great position on the leaderboard going into the intimidating round of 8 with an impressive 9.26 seconds to put us in third place with just over half a second between us and first place who had an incredible 8.74 lap time, above and beyond what we thought would be a winning time for the league as a whole.

Round of 8 and the final 4

It’s at this stage I’d like to thank our company for the great support that was given to the league. When we got to the final 8, it was confirmed that they would fly us over from Belfast, UK to Portsmouth, New Hampshire to “drive” our model ourselves in the final two days of the tournament. We also had the opportunity to practice some models on a physical track before the big day.

To say we were nervous at this stage is a huge understatement. We had put hours of our lives into research, training, log analysis and even giving talks within our company to help promote DeepRacer for future leagues. We had one last strategy to help increase reliability for our models: Track agnosticism. Again this strategy was inspired by the world record holders in Tokyo and involves making a model that can complete more than one track. The theory is that if a model is trained on multiple tracks, it will learn to make it’s decisions based on the markings on the road and not the background. If it’s only using the markings on the road then it’ll be less likely to be distracted by differences between the physical track and the simulated environment, as well as any aberrations on the physical track (shadows, beans of light, reflections, imperfections on the track, marks on the walls of the track, spectators etc.).

Once again the slack community came up trumps. We were able to discuss this theory with even more experienced DeepRacers and were pointed towards a Github repo that contained code to visualise what the car was using to make it’s decisions during it’s training runs (https://github.com/jochem725/deepracer-viz.). This repo takes the video stream of the training run and overlays a heat map over the parts of the image that the algorithm is “focussing on” to make it’s decisions. As we saw the red areas begin to amalgamate on the dotted line in the center of the track and the two white lines along the outside, we knew that it was using the cues we wanted it to.

Heat map of what the model is using to make it’s decisions

We had a very nervous and long flight over to the Boston followed by a short drive up to Portsmouth and an uneasy sleep before the round of 8. At our last practice, we had a set a time that would put us in first place, but couldn’t guarantee that the car would behave the way we wanted it to on the day and had no control over how well the other teams would perform.

Waiting on our flight to the states

We were first to drive with 7 teams to race after us. It was a disaster. We only slightly improved our time to set a 9.18 lap, but the model just wasn’t performing the way it had previously. We had an agonising wait watching the rest of the teams race. If just 2 teams beat out time then we’d be out. The first two teams went by without improving and the next team up was the team in first place with 8.74. They improved their already amazing time to get 8.47 seconds and worse still, the next team overtook us, putting us in fourth place with a very real chance of getting knocked out. The next three teams raced for what felt like an eternity, us breathing a selfish sigh of relief each time a car came off the track before the finish line. After the longest 12 minutes of our lives it was over and we scraped through to the final in fourth place with ~0.5 seconds between us and the big prize.

Having eaten a big slice of humble pie, we returned to our hotel to discuss what had happened. The first place victory never seemed so far away, and yet we were still in the tournament. The pressure felt huge at this stage after getting flown across the Atlantic to “drive” a small machine learning car around a track, but we were just relieved to have not went out a day early. Still, we knew our model (or one of the slight variations on it) could set a lap winning time.

The model under performing affected some of us quite badly…

The day of the final. We were last to race. At home we had all of our loved ones cheering us on and a live stream of the final was being watched by dozens of employees of all levels back in our local office. The 1st place team raced first and didn’t improve on their time. The second team may have been training more models the previous night and taking a big risk trying them on the final day, because they barely completed a lap, while we were relieved, we were also worried that the car might not have been performing as well as it should. These fears were quickly blown away when the third team stood up and set a blistering lap time of 8.18 seconds to take first place. The cheers filled the room and then died down to an encouraging clap from everyone including the understandably disappointed team who knocked off first place. We had still set a better time in practice.

Our driver took the tablet and set the throttle to 70%. It’s here I’ll give my last piece of advice. Find a flow on the physical track. Get your model completing consecutive laps comfortably before pushing it to it’s limits. A rolling start can knock off quite a lot of time from your lap and if you’re finishing the track every time, every lap will have that rolling start. It also means the car is making decisions based on images it’s familiar with and therefore will be more confident with each decision. With the throttle at 70% the car went round the first lap, then the second and we set a time of 8.26 seconds. We knew we could do it. It set another time 8.45 seconds, a bit slower, but still consistently fast. The third lap was beautiful, the car flew round the hairpin and our driver, knowing the hardest part was over ramped up the throttle as high as he dared: up to a massive 84% and prayed it would stay on the track. It completed the penultimate corner, then the last and we all looked to the timer which displayed the most beautiful sight I’ve ever seen on a basic LCD screen: 7.61. The room erupted in cheers. I’m told the same thing happened back home in Belfast in a communal area where everyone was watching. Our driver set the tablet down with 2 minutes to go and breathed a huge sigh of relief while the other team members patted him on the back and exchanged fist bumps and high fives. We’d won the tournament and with it, 5 golden passes to re:Invent 2019.

Quick shout out to the rest of the finalist teams in the tournament: Fast and the FuriAWS, 3 Musgineers (Morningstar Inc.) and Slow Down For What, class acts every one of them.

--

--