Drive faster with machine learning

Published in

Acta Schola Automata Polonica

10 min readJan 3, 2022

Racing drivers hate him for this one simple trick

There’s this autonomous driving challenge called F1tenth that has both a “physical” version (in which you need a physical car to participate, 1/10th the scale) and a “virtual” one where anyone can compete. In this blog post we’ll focus on the virtual race which comes with a gym-like environment called f1tenth_gym which looks like this:

Yea, I know — you can almost smell the burning rubber :)
But actually, the simplicity of this environment (and most gym environments, I suppose) is its greatest strength. You can prototype quickly and test out more ideas. This is the reason I switched to this environment from the CARLA simulator which, although prettier, is sometimes too cumbersome.

The controller that’s driving the car in the short clip above is called Pure Pursuit and the best lap time I got was 52.47s after testing out more than 1000 different parameter combinations. The topic of this blog post is: how I got that lap time down to 44.35s using the same data I collected while searching for optimal controller parameters.

Introduction

The car has two actuators: the steering angle, and the desired speed (and not acceleration, as one might expect from a “normal” car).

I used a Pure Pursuit controller which essentially aims the steering angle at some point that lies further down the road on a pre-defined path, thus giving a pretty easy but quite robust way of choosing the first actuator. (By the way, if you’d like to know more about the Pure Pursuit controller look no further than this lecture.) That distance is often called the lookahead distance and I should note that normally this distance depends on the speed of the vehicle, but in my implementation it’s actually frozen for the entirety of the race, just to make it even simpler.

You still need to come up with the second actuator, speed. The simplest approach: stick to a pre-defined speed unless the centrifugal force exceeds a tire force threshold and if it does, reduce the speed to a value that is right below that point. Obviously, this simplification is bound to fail because: the vehicle’s dynamics are way more sophisticated, and also because although we ask for a particular speed, the car can’t just instantaneously “make it so”. For example: say we’re going 10m/s and our procedure determines that the speed should actually be 5m/s tops. Asking for a particular speed is not the same as magically setting the speed of the vehicle to that value, it takes a while to get there.

We thus arrive at a pretty simplistic controller whose behavior (and overall performance) depends on three parameters:
• lookahead distance | 4.24m
• speed setpoint | 10.62m/s
• tire force max | 13.42N
where the values provided are those for which I got the best lap time, just to give you an idea about their magnitudes.

Motivation

The problem with this controller is that you might find parameters that give you OK-ish results for one set of conditions (a particular corner, let’s say) but not for the whole race track. In particular, the track in which I was interested in has a lot of straights:

Our simple controller can either drive well through the corners, or drive fast in those straights, not both. One would need to do path optimization to be able to go through the corners faster, or add an “if” statement to the code to drive fast in the straights and slow through the corners, or any other trick to make it more swanky. This would increase the overall number of parameters and, honestly, would feel a bit clunky. But also, I had a bigger picture in mind, namely: I wanted to have a procedure that can be generalized to racing against opponents. (In the next blog post, though.)

The idea was to train a model that emulates the behavior of our weak controller for a given set of parameters and then use that model to continuously determine the best parameters. This requires two things: 1) a method of scoring the parameters that will allow us to determine which are “best”, and 2) a procedure for determining the best parameters at a given situation without evaluating all possible combinations of parameter values in real time, during the race.

Methods

Model

The input to the model are three vectors: the state of the vehicle, the path that we’re supposed to follow, and the parameters of the controller. And there are two output vectors: the future trajectory, and the future actuators that are needed for that trajectory to come true.

The colorful blocks represent modules that contain one or more fully connected layers.

INPUTS:
The state of the vehicle comprises 5 numbers: longitudinal and lateral speeds, angular speed (Z-coordinate), and current actuators (steering angle and speed). Note that there’s no position nor yaw. This is because we’re already operating in a coordinate system that’s centered on the car (its position is always (0, 0)), with the X-axis shooting forward in the direction in which the vehicle is pointing (so yaw is always 0).

The cent stands for centerline which is the path we’re following. As the name suggests, this is the path that lies exactly in the middle between the race track’s bounds. (Clearly, this path is not the fastest but we’ll worry about that more in the next blog post.)

And finally, the params are the controller parameters (lookahead distance, speed setpoint, and tire force max) that were set while the data was recorded. The idea is to look for better params while driving via an online gradient ascent procedure such that we constantly adapt to the changing conditions.
_________

OUTPUTS:
The trajectory are 150 2D points representing future positions of the vehicle, and the act is a vector containing future actuators: 9 steering angles, and 9 speeds.

We need the trajectory to compute the score later on, whereas the actuators are, well, needed to drive the vehicle. Although I should say: not all of them, I’m only taking the first steering angle and speed and throw away the rest (as you would normally do in a controller like Model Predictive Control, which I used in a previous blog post).
_________

The colorful boxes in the graph depict modules containing one or more fully connected layers, nothing fancy.

During inference, the state and centerline are given, we can’t change them. But the idea is to probe the model for alternative controller parameters that might yield a more promising trajectory.

Score

Initially, I used the term “return” but then I realized it might confuse reinforcement learning savvy readers so I settled for “score”. Remember, we’d like to be able to say how good the current params are (those used as input to the neural network), i.e. we want to assign a score to the controller parameters. I used the following formula to compute the score:

where progress is a function of the trajectory and the centerline and is simply the total distance travelled along the centerline, whereas the penalty is a function of all the distances between the points in the trajectory and all the points in the nearby bounds of the racetrack. See the code for more details.

During inference, while driving the vehicle, I’m estimating the gradient of the score with respect to the controller parameters and I’m doing that by feeding the model a batch of size 25 in which all states and centerline inputs are the same, but the controller parameters differ and these are: the current param values + 24 neighboring values needed for estimating the gradient.

I continuously update the controller parameters as you would normally do in a gradient ascent loop:

This online optimization probably never reaches the optimum, partly because the score landscape constantly changes, and partly because this is a pretty crude approach, but the hope was that it at least adapts quickly enough to new conditions.

Results

This is the graph representation I used to determine if the model drives as intended (see the full video here):

The upper row depicts model predictions (going from left): the future trajectory (150 steps into the future), and 9 future steering angles, and 9 future speeds. The red lines and dots represents the best prediction in a batch of size 25, blue represents the results for the current controller parameters. All those grey lines correspond to suboptimal input params.

The bottom row contains a minimap + values of progress and penalty calculated for the predicted trajectories. Both plots have observation indices sorted according to their progress values so that we can quickly see how big the penalty is for the trajectory that goes the furthest.

Data

I’d like to be able to test out this approach on an actual car (1/10th scale, no need to panic) so I needed to consider two things: safety and how much data is needed.

Safety is ensured as long as any walls are far from the path the car is following and the localization algorithm works correctly. If both are true, I can sample controller parameter values from a fairly wide range and run data collection without additional supervision.

The best lap time I got (44.35s) was when I trained a model on data from 1000 races, each race lasting for 110s simulation time (maybe 7s of real time), so that’s 1000 x 110s = 30.6h, way too long. But I also made experiments where only 32 races were conducted (which is ~1h worth of simulation time) and the best lap time then was 45.49s which is “good enough”. Our real world is a bit more complicated so that 1h of simulation time might translate to just about anything, but at least it gives me some hope it will be attainable.

Inference time

Inference on my workstation (GTX 1080Ti) along with progress and penalty calculations (which are also done on the GPU) takes ~1.5ms. That’s for a batch of size 25, but I also checked how long it takes for a batch of size 512 and it was ~3.1ms. I don’t know yet how fast it will be on a Jetson but, again, this looks promising, even before any additional speedups (I’m thinking of TensorRT, that’s the one I know the best).

Discussion & future work

It’s not lost on me that the lap time reduction wouldn’t be so profound if I used a more sophisticated controller. In particular, an MPC that optimizes the distance travelled along the centerline would probably lead to a model wouldn’t have such a significant lap time reduction (unless, for example, there’s something very wrong with the dynamical model used for the controller). Speeding up the Pure Pursuit controller wasn’t really the point here, I just wanted to know if I’m heading in the right direction, if there aren’t any critical errors in my code.

The fact that the lap time went down is a good sign, and now I can move on to the next step which is the optimization of the “cent” input that’s being used by the model. Currently, that’s not feasible because this feature has a size of 600 (300 2D waypoints), but it can be substantially condensed. Once that’s done, and once I figure out how to effectively sample the space of possible waypoints, I should not only be able to follow a faster path, but I’m also hoping to avoid collisions with obstacles.

The next step after that is avoiding collisions with opponents. For that I would need to at least roughly forecast their future positions. But wait a second… the current model can do just that, all I need to do is make the batch a bit larger, it will still be blazingly fast. (Assuming I know the state of my opponents, but that’s “doable”.) Then I can calculate the penalty using not only the bounds but also the predicted trajectory of the opponents and optimize the score as I’m doing currently.

This is where I am right now and where I’d like to get from here. The project is split into three repos:
* a fork of the f1tenth_gym project
* this repo for training the model
* and this overarching repo (a Python package) containing code used in both of the above projects.

If you enjoyed this post, please hit the clap button below and follow our publication for more interesting articles about ML & AI.