Self-driving Donkey Car Training

Team members: Karina Rao, Kwasi Akuamoah Boateng, Peep Kolberg, Tõnis Hendrik Hlebnikov, Uroš Petrović

13 min readJun 12, 2022

Donkey is a self-driving platform for remote control cars. The car has a front-mounted RGB camera that records as the car drives. While driving, it also records its current throttle and steering angle values. Using the recorded data (images with throttle and steering values), it’s possible to teach the car to traverse any track.

A human must first control the car driving the track. Around 15 laps of clean driving are enough. The data can then be used to train a machine learning model (convolutional neural nets work best) to imitate the human’s driving. So, the car learns to drive through imitation learning.

Donkey is open-source and comes with most of the machine learning pipeline already implemented. It even offers a choice between different tried and tested Keras model architectures. Training a model is as easy as recording some data and executing a command. But in the configuration files, there are also options to change every small detail about the car’s driving and model’s learning.

We chose to train models for two tasks. In the first task, we taught a car to drive on a figure 8 track and give way to another car on the same track via the Priority to the right rule. For the second task, we took the car outside and made it drive in nature. The task was to make the car follow a footpath.

We planned to train different generations of models but had enough trouble trying to get the first generations to drive well. Different generations of models means first a human drives the car and the model learns from the human’s driving. Then, the model records its driving and we train the next generation on the model’s driving, and so on.

In this blog post, we describe the environments in which the cars drove, and how we collected the data. We also report which models we trained, and show the finished models driving.

1. Figure 8 track

In this task, we wanted two cars to drive on a figure 8 track simultaneously and give way to one another by following the Priority to the right rule. This rule states that “a vehicle is required to give way to vehicles approaching from the right at intersections” (Wikipedia, 2022).

But this didn’t work out because a model trained on one car might not perform well on another car. That’s due to the random variability between the hardware on the cars. The steering and throttle values of one car do not result in the same angle and speed of the wheels on another car. So, if a model learns to control one car, its output values will have a different effect in another car. The battery’s charge level also influences a car’s driving.

1.1. Track and Data

We built the track by drawing the 8-shape on paper. Since the car’s turning radius is not that great, the track needed to be quite big. The track (see Figures 1, 2) was roughly 5 meters tall and 2 meters wide.

Figure 1. Working hard to build the track.

To test if the track is suitable for the model to learn, we first aimed to teach a single car to drive the track alone. A human drove the car for 15 laps, and the car recorded around 5,000 images in the process. We set up the training software to add transformations and augmentations to the images but we later realized that the built-in methods didn’t work.

First, we wanted to delete about 25% of the pixels from the top. However, we saw that the input size of the trained model was still the entire image. Unfortunately, we discovered this bug too late. We could’ve added our own cropping function to the source code but we didn’t have time for that anymore. We explain the effect of not cropping the images in the Results chapter.

Secondly, we wanted to make the model more resistant to the camera’s variability. For that, there were options in the Donkey software to let the training function modify the images’ brightness and blur randomly. Since cropping the images didn’t work, we suspect the blur and brightness modifications also didn’t work. Luckily, missing those augmentations didn’t affect the model’s ability to drive.

At first, we wanted to add our own custom image processing pipeline to the car but decided against it since the car seemed to learn well with just the built-in transformations/augmentations. We discovered only later that the built-in methods didn’t actually work.

Our end goal for this task was to get two cars to drive the track simultaneously and learn to give way to one another. For this, we recorded additional data where we set up different scenarios that the cars will encounter when driving together (see Figure 3 for examples). In each scenario, one car (let’s call it car 1) was turned off (not collecting data). The other car (car 2) was recording data. We let car 2 record some data, then moved car 1 slightly, recorded more data, moved car 1 again, and so on. This was repeated for every possible situation when the cars could see each other. In some scenarios, car 2 had the right of way, in others, car 2 had to stop (throttle value was 0). In the end, we also drove the car alone again for some laps. Altogether, we recorded about 15,000 images for simultaneous driving.

Figure 3. Point-Of-View of car 2. The car visible in the images was immobile and a human moved it between frames. The figure shows three scenarios. In the right image, the other car is approaching the intersection which means we should stop and give way. In the center image, the other car is in the middle of the intersection and we must wait for it to pass. In the left image, the other car has crossed the intersection and we are free to go.

1.2. Training the Model

As a starter, we chose to train the Linear model. It had 5 convolutional layers, followed by 2 dense layers, and an output layer with two nodes. One output node was for the throttle value, the other for steering. The model was configured with the default hyperparameters specified in the configuration file. The loss curve of our first Linear model looks excellent (Figure 4).

Figure 4. The loss curve of our first Linear model looks excellent.

The below video shows the saliency of the model while a single car was driving the track. The green and blue lines show the steering angle of the car (green is the human’s driving, blue is what the model predicts). The green line is a lot more jittery (the human pilot made a lot of small corrections while driving). The pink areas are the most important (high saliency) areas of the frame.

The video shows why cropping the images was so important and it’s a shame we couldn’t use cropped images. We see that on the straights, the model steers mostly according to background objects but in turns, the model also recognizes the track as something important. This behavior emerges because, as the car turns, the background also moves but the track is consistently in the frame. On the straights, the background and track are both consistent. So, on the straights, it was watching the background, and in the corners, the road. Figure 5 shows the saliency when the other car was also in frame. The model realized the other car was relevant.

Figure 5. Our model learned that the other car was an important part of a frame (indicated by the pink high-saliency blobs). In the left image, our car was free to continue through the intersection. In the right image, our car had to stop to give way to the car approaching from the right.

In addition to the Linear model, we also trained the more complex RNN (4 time distributed conv layers, 2 LSTM layers, 3 dense layers, output layer) and 3D (four 3D conv layers each followed by max pool, 2 dense layers, output layer) models but they didn’t drive better. Instead, their inferencing (forward pass on the network) time was so slow that they were not practical. A good time for a model to make a decision about a frame is 20–40ms. Our Linear model already took 60ms and the RNN and 3D models took over 400ms. The complicated models were too slow to react to the curves and drove off the track.

Donkeycar also has a Behavioral model but we didn’t train that because we didn’t get the behaviors to work. This was unfortunate because the Behavioral model is specifically designed for our task. It would have easily allowed the car to learn different behaviors: when to stop (give way to car on the right), and when to go (drive across the intersection). But the Linear model learned when to stop/go well enough.

1.3. Results

The below video shows our Linear model driving alone on the track. This model learned to drive well but as the saliency showed, it also used background objects to navigate. At the end of the clip, it does drive off the track but that was actually a rare occurrence.

We show videos of our car driving in different situations. In those videos, the moving car was driving autonomously (using the trained model) and the stationary car was put in different positions to see how the autopilot reacts.

In the first scenario, the autopilot was approaching the intersection and there was a car on its right. The autopilot had to stop and let the other car pass. We show two examples and the autopilot successfully stopped in both.

In the second scenario, the autopilot was giving way to the other car which was manually pulled through the intersection. Autopilot was supposed to start moving when the other car had cleared the intersection. At the time of recording, the battery was getting rather low, so the autopilot was slow to start moving or needed a little push.

The third video shows the situation when the autopilot had the right of way. When it got close to the intersection, it saw the other car and slowed down. It hesitated even though it had the right of way. This was not the completely correct behavior — it should’ve continued through the intersection without slowing down. But what’s more important is that it still drove through the intersection. It learned that there was a difference between the other car being on its right vs on its left. So, our model did learn the Priority to the right rule even if it hesitated at the sight of the other car.

It was rare to see the autopilot crash into the other car (i.e. failing to stop). The more common type of failure was that the autopilot refused to move even if it had priority. The final video shows one such occasion.

2. In the Wild

The initial idea was to train the car to traverse in between the agricultural beds. Since no test beds were found in the city — the community gardens had high boxed beds — we moved to the tracks in the parks. The criterium was to have a distinct path — either stone, asphalt, or soil track surrounded by grass.

2.1. Track and Data

We tested out a couple of parks and several tracks. In the first attempt to record the data we drove the car in the middle of a wide footpath (second image in Figure 6). We realized that the image footage had too little grass on the sides, hence we decided to choose a more defined path. Next, we drove the car on two separate paths in another park. The first was a circular trail that the car followed from the inner circle (third image upper photo). The second was a soil footpath in a triangular shape (third image lower photo). There was a distinct difference between driving on the stone and soil path as the car was unable to move forward at lower speeds on the soil. To solve this we calibrated higher max throttle. Even then the car got stuck when some grass patches were there on the soil path. Finally, we found an asphalt-covered rectangular track with grass in the middle (last image in Figure 6).

Figure 6. Outdoor track selection in Tallinn.

As it appears our preliminary idea to drive on agricultural land would not have worked out with Donkeycar S1 because the surface needs to be quite smooth for this car model to be able to move forward. On an uneven surface, the battery also tends to discharge faster as more power is consumed.

On the car, we had Donkey 4.2 software and on the computer, we installed Donkey 4.3. There are differences in the 4.2 and 4.3, with the latter enabling image transformations and augmentations. While transformations require the 4.3 version when training (on the computer) and testing on the car, augmentations are done only during training the model and are not applied while driving. Hence we were able to use augmentation despite running the car on the 4.2 version. It took us quite some time to get the augmentation working because we could not run augmentation when training with Donkey UI. Later we figured out how to enable augmentations from the command line.

We drove the rectangular track for 10 laps and recorded around 9000 images. The battery did not allow us to record more in one go and during the last laps, the car was barely moving forward. Throughout the path there were different lighting conditions, shadows, and even pedestrians passing and vehicles moving on the nearby road. The variance in the environment has enabled the model to learn the track and focus less on the surrounding objects.

Since the Donkey 4.2 on the car did not allow us to implement image transformations, we tried a work-around for cropping images. This was done externally (in Jupyter Notebook) to remove the unnecessary detail from the images. Since we expected the car to navigate itself according to the grass, approximately ½ of pixels were cropped from the left and ¼ of pixels from the top of the image (see Figure 7).

Figure 7. Cropping image from top and left.

2.2. Training the Model

The DonkeyUI application allows for easy training of the autopilot. It supports different models of training and our plan was to pit 3 models against each other and see how they perform in outdoor circumstances:

Linear (smooth steer, robust, performs well in limited compute environment, may fail to learn throttle well)
RNN (very smooth steer, can train to a lower loss, long train times, performs worse in low compute environment)
3D_CNN (same as RNN)

However, after discussion with our project supervisor, we decided not to move on with RNN and 3D_CNN since these require a lot more data and are slow (inference time, i.e. fewer decisions per second). Instead, we trained linear models without and with image augmentations. Currently supported augmentations in Donkey are MULTIPLY and BLUR which generate brightness modifications and apply Gaussian blur. We applied both with default settings of MULTIPLY (0.5, 1.5) and BLUR (0.0, 3.0).

Figure 8 illustrates the loss of the non-augmented model (left image) and augmented model (right image) for rectangular track. It is visible that the augmented model converges faster and there seems to be some overfitting for the non-augmented model. It also appears that training loss for the augmented model was much higher at the first epoch as compared to non-augmented training loss, but augmented model loss came down very fast.

Figure 8. Model loss for non-augmented (left) and augmented (right) linear model.

2.3. Results

Hereby we present the results obtained from two different outdoor tracks: a circular path at Kopli park and a rectangular path at Majaka park in Tallinn.

Our attempt to crop images was rather fruitless. After training with the cropped images the Majaka model did not seem to become better — while we noticed some improvement at a few turns, there was degradation of performance in a straight line. This degradation was especially prominent on the pathway where there was no curbstone (2 out of 4 pathways on the rectangular track had curbstones). We conclude that the model learned to orient according to the curbstone and in the absence of the curbstone it turned into the grass in an attempt to find it.

The autopilot that was tested in Kopli park didn’t manage to perform well. There are many reasons for that but what was deduced from other locations and training:

The track itself is difficult for the car to drive in, requiring high throttle values for moving
As the battery level lowers, the car slows down, causing the car to stop and not move
The steering angle needed to turn during the training is very small, almost 0 at times, so the pilot is likely to be confused and learn the wrong angle

In the video for the Kopli model, we can see that the car is having trouble turning.

Kopli: non-augmented linear

www.youtube.com

Even with augmentation, the model is not doing better, so we will assume that the training was unsuccessful.

Kopli: augmented linear

www.youtube.com

Compared to the second track that is tested at Majaka which is a more defined path, having sharper turns, we observe the difference. The first model is linear and without augmentations. We can see that the car is having difficulty turning, and sometimes it even didn’t turn and continued going forward. However, when going forward into unknown territory it recognized a turn and turned right into a side path. This shows that the model was able to generalize when driving in a new environment.

Adding image augmentations during training greatly enhanced the pilot´s performance, as it was able to move extremely well and make very precise sharp turns. All in all, augmenting the images was the game-changer!

Summary

Overall, our figure 8 autopilot exhibited the desired behaviors. Alone on the track, it drove flawlessly, although sometimes depended too much on background objects rather than the track. This shows that in an environment where the background is constant, cropping the images to contain just the relevant parts of the track is necessary. Otherwise, the model can easily focus on irrelevant objects.

The simple Linear model was even able to learn different behaviors with respect to the other car. It learned when it had to stop and when it could go but its behavior wasn’t always perfect. Sometimes it refused to move when it saw the other car even if it had priority. It’s possible one of the more complicated models (RNN or 3D conv) could’ve behaved better but they were unfortunately too slow to use in practice.

Regarding the outdoor models, the desired behavior was achieved when using a track that is more defined. Applying augmentation is necessary here to make the car drive successfully. Having a cropping pipeline could greatly enhance the performance and make it more generalized.

References

Wikipedia (2022). Priority to the right. As accessed on 26.05.2022 from: https://en.wikipedia.org/wiki/Priority_to_the_right