Decoding comma.ai/openpilot: the driving model

Chengyao Shen
7 min readNov 11, 2019

--

Months ago, right after Burning Man, I ordered a comma.ai EON Devkit with a grey Panda and a Giraffe (Now they integrate them into one car-harnesss and charge $140 more ($199+$60 -> $399)).

Mounted it on my car (a Honda Civic 2016 Touring), I was excited to get it run with openpilot 0.6.4 (after reading documents scattered online).

EON Devkit mounted on my car

As a CV&DL researcher and engineer in self-driving industry for years, I tested the algorithms in openpilot under different scenarios, and found the behaviors of the lane detection algorithm interesting:

Lane Detection results for highway daytime scenario. It can be seen that openpilot can still detect lane on faded lane marking, and the detection result is not always aligning with the actual lane when there is curve. The “TAKE CONTROL” warning is due to the steer torque limit from Honda EPS Firmware.
Lane Detection results for local night scenario. It can be seen that openpilot can still draw a virtual lane line when there is actually no lane marking on the crossroad. The model is predicting the positions of the lines instead of just detecting it in a bottom-up way. The visualization is from openpilot/kegman branch.

These behaviors are not like a commonly-used feedforward network, and aroused my interest in the lane detection model used in openpilot.

I looked into the source code of openpilot, found that all the three deep learning models in openpilot are in Qualcomm deep learning container (DLC) format, and easily located the place where the lane detection model is called:

openpilot/selfdrive/visiond/models/driving.cc

Obviously the codes in “TEMPORAL” indicate that some recurrent structure is implemented.

Dive into the addRecurrent function:

openpilot/selfdrive/visiond/runners/snpemodel.cc

From the code, it seems that a sub-vector from the output vector of the network is connected to an input tensor in the network. This vector is in the size of 1x512 (the TEMPORAL_SIZE).

Besides the clue in the source code, I also found a recent tweet from comma.ai, indicating that they actually implement Gated Recurrent Units(GRUs) in their driving model:

https://twitter.com/comma_ai/status/1145798551298965504

But how GRUs are implemented in the model remains a question to me. Is it implemented by a convGRU (like convLSTM), or a CNN+RNN structure? How are the outputs of the network defined and organized?

I searched through the googlable contents in the Internet and the comma.ai discord community, but found no articles, publications or discussions on how the model works in detail. It seems that the only resource I have is the source code (and the DLC model file in it).

I thought the information in DLC file is most probably encrypted (which proves to be wrong), so I started from the source code.

Source Code Analysis

Digging into the code of driving.cc, it can be easily seen from the codes and the comments that the model outputs the probabilities and the standard deviations of the drivable path, left lane and right lane, as well as the various information from a lead car:

Source code from selfdrive/visiond/models/driving.cc and selfdrive/common/modeldata.h

These codes are all in the function model_eval_frame and this function is called in selfdrive/visiond/visiond.cc:

From the codes in visiond.cc, it can be seen that the output of the driving model is published to ZMQ through port 8009.

So I do a search on ‘8009’ in the code and find all the subscribers:

Source code for the subscribers of the port 8009

The first subscriber is in ui.c, which is mainly used for detection result display on EON Devkit.

The second subscriber is defined in service_list.yaml, so I need to find the underlying functions that call this file. Fortunately, with several rounds of cross searching (thanks to VS Code), I find all the subscribers for the output of driving_model:

Modules where lead car information and drivable path information from the driving_model are used
Modules where all the lane information from the driving_model is used

These subscribers are all modules written in Python in the controls folder. The radar module receives the lead car information from the driving model and fuses it with radar data to create more accurate lead detection. The planner module receives the drivable path information and implement a Model Predictive Control (MPC) for the driving speed. The lane_planner module receives the drivable path, left lane and right lane information and outputs it to the path_planner module.

Also, I found that service_list.yaml enlists all the ZMQ pubs/subs and the communications between them:

Combining all these information, I draw a draft diagram to show the general interfaces of the driving model:

Visualization

However, code analysis doesn’t provide me much useful information to infer the architecture of the deep neural network. So I planned to analyze the driving_model.dlc by building an isolated testing environment using Qualcomm Snapdragon Neural Processing Engine (SNPE) SDK.

Coincidentally, when I was exploring the reference guide of SNPE SDK, I found there is actually a visualization tool for DLC file: snpe-dlc-viewer.

All the tools in SNPE SDK only run in Ubuntu environment. So I quickly ran a docker for Ubuntu on my Macbook Pro, installed necessary python libraries and set the PYTHON_PATH, and type the snpe-dlc-viewer command to convert the driving_model.dlc to an html (I was so excited when this moment came). The html provides a fantastic interface for the model visualization:

From the visualization, it can be easily observed that the feature extraction CNN has a ResNet-like structure stacking 4 layers (conv2 to conv5):

After the CNN, the 8x16x4 feature is reshaped to a 1x512 vector and the vector is fed into a RNN-like structure:

This structure is obviously a modified version of GRUs:

GRU visualization from What is a Recurrent Neural Networks (RNNS) and Gated Recurrent Unit (GRUS)

After the RNN stage, the 1x512 output is forked into 5 channels. The first 4 channels are connected to 4-layer MLPs and finally output the information of path (1x384), left lane (1x385), right lane (1x385) and lead (1x58):

The 5th channel is directly concatenated to the output and connected back to the GRUs input (rnn_state:0) in the code. For more details on the model, you can visualize the model using snpe-dlc-viewer, or simply download the html file below and open it in your browser:

Another thing noteworthy is, from openpilot 0.6.5, comma.ai changed the stage between the CNN and GRUs from a simple 1x1 convolution structure to an inception-like structure:

Visualization of the stage between CNN and GRUs in 0.6.4 driving_model
Visualization of the stage between CNN and GRUs in 0.6.6 driving_model

I believe this change is the key to the improvement on path prediction and lead detection mentioned in the RELEASES.md.

Discussion

So far, I have explained the interface and the structure of the driving model. However, there is still one question remaining:

From the model visualization, it can be seen that the 1x8 DESIRE vector is concatenated with the CNN outputs and dissolved into the GRUs:

From the code, we can see that the DISIRE vector is initialized with all zeros and is not used during the inference:

Hence, we can conclude that the DESIRE input is not used now and I believe that this input is reserved for future use.

But what could the future use probably be? I searched the keyword ‘desire’ throughout the openpilot source code and found an enum with 7 status defined:

Not sure whether this ‘Desire’ is related to the ‘Desire’ in driving.cc. If it is, it will be very exciting to see a control vector feedbacks to the driving model and directly influence the perception results.

This article records my journey in decoding the driving model in openpilot. I would like to thank comma ai for opensourcing their code and models.

Hope this article could be a good complement of the scattered documentations of openpilot online, for you to better understand how openpilot works, how to use different tools to understand how it works, and to inspire you to train your own models that could run on openpilot.

Please let me know if there is any question, or anything incorrect/inaccurate in this article.

--

--

Chengyao Shen

Deep Learning | Computer Vision | Computational Neuroscience