Autonomous RC car — Alternative tracking method
First of all. Is an RC car an “RC” anymore if it is driving itself? I guess not, but at least this was an RC car. This topic is about how it turned to be an autonomous vehicle using Raspberry Pi, Python, and round cookies on the floor.
This is probably one of the simplest autonomous vehicles. It is observing only the direction where it should be heading next, using a camera and some simple computer vision algorithms. A 2D driving simulator was utilized to train a “driver agent”, just enough to bring the mysterious AI to the equation. But first, let’s have a look at the components.
Hardware
- Kyosho Electric on-road RC car
- Raspberry Pi model 4
- Pi camera
- Arduino Nano
- Rotary encoder
- Xbox360 controller and the USB-dongle
The RC car was originally an All Wheel Drive but by removing the universal joint shaft it became RWD. This allowed the rotary encoder placement into the front shaft mounting. RasPi 4 is just a bit more powerful version of Pi3, so I guess they could be interchangeable. However, all the computations are made in the Pi, so we cannot have too much processing power. Arduino is measuring the speed and the car battery voltage. Xbox controller is just a bonus, which gives a possibility for human driving e.q. for data collecting purposes.
Software
- Computer Vision for extracting features from the camera images
- Drive control (RasPi PWM for motor and servo control)
- Driver agent selection
- Arduino code for speed detection and battery voltage measurement
- MySql database logging for data collection
- Web server and dashboard UI (Real-time camera stream, Metrics plotting, Parameter controlling)
- Neural Network for real-time output calculations
- Simulator-trained Neural Network agent
Despite the simple target, the code is trying to be scalable in terms of more advanced driving capabilities. Let’s dive deeper into important modules.
The simulation. Training an agent to drive a car in a 2D simulation environment was the first step for autonomous driving. The simulator itself was the OpenAI Car Racing simulator. By default, the simulator doesn’t have any features that could be later easily extracted from the real world. Also, the trainer needs to be built. This repository has some simple agent state features, like the direction to be headed, which in our case is the one used for the RC car. There is also an Evolution Strategy trainer, which is a bit brutish way to find optimal parameters for the Neural Network. However, the task and the network are quite simple
so the method worked out well. The direction, in this case, is the relative angle between the track focus point (purple line) and the car. Here is a sample of the trained agent that was later on driving the RC car:
Image processing. Now that we have an agent that knows how to drive in a simulation, we need to generate the same features from the real world. The first thing was to decide how to identify the road. Maybe the most common way is to use a camera and lane detection. But for this project, we used circle objects, "cookies”. Dropping cookies on the floor is an easy way to build a track and easy to modify if needed. Symmetric circle shapes are also not so common in nature which makes fewer false detections when using computer vision.
There are probably hundred different ways to locate circles in an image, but we are looking for a light solution. OpenCV gives us a wide range of computer vision algorithms and our choice is Hough Circles. This gives the locations of all the circles found in the image within the given parameters. However, there is a fundamental fault in our setup; Circles on the floor are only seen as circles from a top view perspective.
The camera in the car is filming from 25cm above the floor at a 30° angle. Changing the perspective mechanically to the top view could be hard, but luckily, OpenCV gives us virtual perspective transformation using calibration points. Drawing a rectangle on the floor and using its known dimensions to warp the camera image gives us the top view perspective
This also gets rid of most of the sectors of the image we are not interested in. It also makes the perspective the same as in the simulator, which makes the feature extraction easier. Now, finding the circles is only a question of finding the right parameters and balance between true and false-positive detections.
Calculating the direction requires the coordinates of the next target. This is calculated using the weighted average of all xy-coordinates of detected cookies. Points where the Hough Circle detection is more certain, are weighted more. This way false positives should have less effect. The direction angle is calculated between the average coordinate and the car’s mid-point.
Drive control. The drive control module maps the driving instructions to the hardware and allows us to choose who’s driving. Replacing the remote controller with Pi GPIO control was straightforward since the car’s MCU and Servo worked with PWM. PiGPIO-library’s hardware PWM turned out to be the most reliable one for control.
The module allows the creation of new driver classes that can be used for driving. For example, this project had four drivers: Idle for just staying still during debugging, GamePad for data-collecting and hardware testing, DumDum for reference performance, and AI for the simulator-trained agent. Every class has a set_actions method, which sets values for throttle and steering. This way there can be multiple different AIs and other drivers that have their specific way to calculate the next actions. The only thing that changes in the main module is which driver class is set to the driver’s seat.
Web server and dashboard. Debugging and controlling can be done through SSH, but since it already requires an internet connection, why not use a local web server? Flask server takes resources from the Pi, but according to the performance testing, nothing remarkable. The web server works as a UI allowing to change parameters, follow performance and runtime metrics, and watch a live stream of the camera. The dashboard is a simple HTML template and the plotting is made on the client-side with JS
Logger. For future use and improvements, collecting data from the test cases can be useful. Saving the state of the environment with time allows for recreating the test cases and analyzing the performance and even training new AIs based on it. The data is saved in MySql database also running on the Pi.
It’s alive!
Before going to the results, let’s see who the AI is competing with. A driver class called DumDum was mentioned before, and the name comes from the fact that it is trying to be the least intelligent driver possible. It has a pre-defined constant throttle output and the steering is simply: Direction angle = Steering angle. Also, the AI has some “hyperparameters” that are adjusted manually. Since the simulator does not represent the real world exactly, there are parameters such as speed correction to match the speed input, throttle correction to match the acceleration, and steering correction to match the maneuvering. These are all adjusted using trial and error. And now, here is how they performed (AI on the top row, Dummy on the bottom):
So, quite a clear win for the dummy driver. There can be many reasons for this, but in general, it makes sense that a simple solution for a simple task works better. However, one noticeable difference in AI behavior is the sparse use of the throttle. This became even more clear by increasing the throttle correction when the vehicle drove backward in some situations. This is probably due to the speed and curve steepness detection, as the agent learned to brake (= negative throttle) for steep corners on the simulator. It is also worth noting that the Dummy only used a constant 2% of the total throttle capacity. This leaves a very low resolution for AI to control the “optimal” throttle (assuming 2% is about the highest this setup can manage).
Neural Network in this case might be an overkill, but the game changes if the environment becomes more complex. For example, by extracting more features from the environment (e.q. tire slipping, yaw rate, friction, curve steepness, feature deltas, etc.) and using them as input to calculate actions, traditional programming for every case becomes impossible. Continuing to find optimal values for the correction parameters would probably lead to the Dummy-level performance, but the limiting factor on higher pace would be the computer vision capabilities (= false negatives and positives). Next, let’s have a look at potential improvements.
How to make it better
Going through the edge cases reveals that the vehicle departs the track when there is only one cookie on the camera view. Since the position is already on the edge of the track, every false positive will lead the vehicle off the track. Getting rid of the false positives means a better camera, which would allow us to increase the detection threshold. Having a wide-angle lens camera would allow steeper corners, without losing the track.
The resolution was already decreased to 320x280 to speed up the image processing. Better optics in the camera would probably help without increasing the resolution. However, a better resolution would also help, and this requires more processing power. Currently, the FPS is ~27, and the limiting factor of it is image processing. However, the camera only films 30 FPS, so not much to improve there. One could probably squeeze more performance from the image processing code, but without a more powerful computer/controller, it would remain very limited. Changing the processing to the cloud or separate server would easily give tons more processing power, but creates a problem with network latency.
30 FPS would probably go a long way as long as the computer vision side is solid. However, increasing the FPS would help at a higher pace. This would require a faster camera (60 FPS) and of course more processing power as mentioned before. Higher speeds and lower friction surfaces would need faster reaction time whereas the camera FPS really becomes a crucial factor.
How about SLAM? The Simultaneous Localization And Mapping method is widely used in robotics to create a better understanding of the surrounding environment for the agent. In our case, it would mean that once the vehicle has completed one full lap, it would know what is coming in the next lap and optimize its driving based on it. And not only that, building a map and locating itself on it could help to avoid false positives and stay on the track even in the first lap.
We’ll leave the SLAM implementation to Part 2, but let’s have a look at what we already have for it. There is only one encoder which is measuring the rotations of the differential gear train of the front tires. This gives us the distance the vehicle has advanced. We do not know the real angle of the wheels, but we know in what position we wanted to steer them. Here is a sample of reconstructing the route from one of the test cases:
As expected, there is high uncertainty in the localization using the encoder and the steering angle. Thus the SLAM algorithms prefer to use multiple different localization sources, like encoders, CV, and LIDAR combined.
This is where we have gotten so far. I hope this gives some new ideas for your projects, and leave a comment if so!
Here is the code repository: