At Cruise, we are aiming to deploy self-driving cars at scale by creating level 4 self-driving cars. One important difference between level 4 and 5 self-driving cars is level 4’s reliance on pre-defined maps.
In this article, we discuss one particular map type and the solution that we developed to make the most out of it: tiled pixel map and lock-free producer-consumer shared map cache.
What is a Tiled Raster Map?
A Tiled Map is a 2-dimensional map where each pixel is mapped to a certain geolocation and stores a certain value of that area.
While the format of these tiles could be anything from portable network graphics (PNG) to an uncompressed binary large object (BLOB), the typical internal representation of these tiles are flattened 2D arrays. This makes this type of map extremely useful for requests such as “what is at (x, y) location,” which can be used for preliminary filtration of input data from sensors or for quick fact-checks like, “Is that object on the sidewalk or on a driveable area?”
However, this usefulness does not come for free: the main disadvantage of tiled maps is their excessive usage of RAM and Disk I/O latency to load tiles.
Lazy and Individual Map Cache
Our initial implementation consisted of a Least Recently Used (LRU) Cache to maintain a set of recently used tiles, as well as background threads for loading tiles that will likely be used in the future based on current requests.
It’s important to note that this approach is based on knowledge of map usage pattern in the Autonomous Vehicle (AV) Stack. Typically, AV Stack queries map locations that our self-driving cars see from its sensors.
This implementation works reasonably well as long as the number of users per API instance is fairly limited to avoid synchronization bottleneck.
Initially, this implementation worked fine — until one day, it didn’t. Our AV software kept growing, becoming more sophisticated, and the number of micro-services that required access to the map data kept increasing.
There were two expected outcomes of this hyper-growth with our current implementation:
- Run out of memory
- Overload the system with duplicated work to maintain the individual map cache for each node
Neither outcome was preferred, so we had to come up with a different solution.
Active and Shared Map Cache
We came up with a new idea after thinking of a solution to a known problem: trying to make every node maintain its own map cache.
To solve this problem, we decided to share the map cache via shared memory. Unfortunately, shared memory came with a whole spectrum of new technical challenges, such as lack of support in C++ STL and inability to use pointers. It’s like programming in C but without any pointers.
The standard implementation of LRU Cache uses doubly linked list and hash tables. While it’s possible to implement both of them to work with shared memory (or find already existing implementations), we knew there had to be a better way to store map cache.
The solution to storing map cache: Virtual Tile
We found that solution in “Virtual Tile”. Virtual Tile is a set of tiles in a square region that can be treated as a regular tile, but covers all the area that can be queried by AV Stack at a given time.
With a little bit of tile stitching manipulation in runtime, Virtual Tile can be stored as a simple 2D array, which is a very efficient data structure for storing in raw memory. It also provides a very efficient way to access map values for given (X, Y) coordinates.
In order to maintain the actuality of map data in Virtual Tile, Virtual Tile needs to be moved along with the car position. The most straightforward solution to do this would be to guard Virtual Tile by reader/writer lock and block all readers while Virtual Tile is being updated. However, this is not an acceptable solution because reloading Virtual Tile is a very time consuming operation.
The typical solution in this situation is double-buffering. We need to keep two Virtual Tiles: one that stays exposed to readers and the other that prepares for the future.
The beauty of this solution is that it works perfectly fine with shared memory. Mutexes, regular ones and reader/writer, can be used with shared memory, and the data layout is pretty straightforward without needing to use any pointers or containers. Virtual Tile could be defined as:
The downside of this approach is needing to use two times more RAM and completing the wasteful work required to set up the map area in the next buffer (that overlaps with the current active buffer).
Optimizing with Circular Buffers
The desired optimization came in the form of circular buffers.
According to Wikipedia, “A circular buffer or ring buffer is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end. The useful property of a circular buffer is that it does not need to have its elements shuffled around when one is consumed.”
We take the concept of fully loaded circular buffers (where a new element can be added either to the tail or the head), and we get a rolling array or sliding window that can be moved left or right. We then add a buffer element to enable the loading of new elements without affecting current elements.
We apply this idea to a 2D array and we get a sophisticated 2D circular buffer with a very simple data layout to store in raw memory.
This is a much more memory-wise and cpu-wise solution, which is easy to pair with shared memory. However, this solution still has two weak aspects:.
- Performance. This solution (utilization of shared memory) introduces a shared resource that will be used in a concurrent environment. We can reduce the synchronization bottleneck by using Reader/Writer lock, but the reader’s API can be easily abused by single point queries with extra overhead for synchronization or slow updates which will most probably lead to problems on the reader’s side.
- Safety. A memory corruption event or crash in one node must not have a direct impact on other nodes. While shared memory allows specifying access rights and reader’s API can be restricted to have read-only access, the current solution requires write access to work with shared mutex; as a result, any of the readers can corrupt map cache.
A better synchronization mechanism: Sequential Lock
With these two weak aspects in mind, we searched for a better synchronization mechanism. The idea was flying somewhere in the air. The perfect solution would be able to:
- Allow us to write to the buffer area concurrently while readers use the rest of the data
- Permit us to use double-buffering for metadata, and then atomic switch between the buffers
- Enable us to avoid the problem of memory ordering
The last detail about memory ordering was the trickiest part. That feature made finding a solution so much more complicated than just having an atomic switch between two metadata objects, which we can already do thanks to the current and next state of Virtual Tile.
Then we discovered the synchronization mechanism called Sequential Lock. For those who are interested to learn more about this topic, I would strongly suggest starting with Linux Kernel Memory Barriers¹ and Can Seqlocks Get Along With Programming Language Memory Models² . These two articles provide the most comprehensive analysis of the problem and explain the technique known as SeqLock.
In the final state, the shared map cache looks something like this:
In this implementation, Map Cache updates can be performed without explicitly blocking readers. Readers cannot impact or slow down map updates and they can be scaled pretty much indefinitely with only a small overhead to deal with a few atomic operations per request. Additionally, Readers do not need to have write access to a map cache, which makes it much easier to describe the safety of this solution. Moreover, this system can be recovered (even if it crashed during the update operation), which is much harder or probably impossible with mutexes.
Can you imagine a shared mutex being left in a read- or write-locked state because the process that owned it has died?
Join the team
If you are interested in finding other solutions to challenging technical problems, check out our open roles. You can find my team under the department Engineering — Autonomous Vehicle Software.
¹ David Howells, Paul E. McKenney, Will Deacon, Peter Zijlstra. Linux Kernel Memory Barriers.
² Hans-J. Boehm. Can Seqlocks Get Along With Programming Language Memory Models?