Running Autoware-based Mapping in the Cloud

Published in

PIX Moving

6 min readJun 17, 2019

This original post is contributed by Rohan Rao, during his internship at PIX Moving in June, 2019. This article is an edited version based on the original post with the consent of the author.

So, what’s Autoware? Well it’s an amazing open source software suite for autonomous vehicles, built by Tier 4 and Nagoya University in Japan. It supports a huge number of self-driving modules for sensing, computing, and actuation capabilities, and is relatively easy to set up and use, if you know what you’re trying to do. Check out the GitHub page here. Autoware runs on Linux based systems like Ubuntu, and uses the ROS middleware framework for all supported functionality.

Now the one thing that is unfortunate about Autoware is the minimum system requirements. It requires 8 processor cores, 32GB of RAM, and for GPU based algorithms, a very powerful GPU with high memory (8 GB sometimes is not sufficient, so goodbye GTX 1080 or RTX 2080). Usually a slightly lower spec is okay for running the autonomous driving functionality, but when it comes to building the map which the car uses for localization, a lot of processing power is necessary.

There are two types of mapping methods supported in Autoware, but both are based on the same basic algorithm called the Normal Distributions Transform or NDT. NDT assumes a normal distribution of points for different sub-voxels in 3D space, and uses these to align and register different point clouds, and create the map of the environment as the point cloud changes. The two mapping methods in Autoware are NDT_Mapping and Approximate_NDT_Mapping. The difference is that NDT Mapping will maintain the whole map as a ROS message until the very end of the procedure, making it accurate since the latest scans are compared to all previous scans. Approximate NDT Mapping will create multiple sub-maps with a given size parameter (in meters). This allows Approx. NDT Mapping to run without much memory, since it doesn’t store the entire map, and also requires less computation. However, the map may not be as accurate. Some additional useful details are in this discussion: Autoware issue 533 on GitHub.

Localization of a vehicle using Autoware’s ndt_matching in a given map environment

If you are trying to build a small map, of the order of a few hundred meters, it is okay to run this on a GTX 1080 with 8GB memory, and with the full scale NDT (Normal Distributions Transform) Mapping algorithm. But once you start mapping regions of a few kilometer-square area, the algorithm runs out of GPU memory at some point, and the ROS message /ndt_map also crosses over 1GB in size, causing ROS to consider it an error (as per the ROS standard) and thus causing it to lose message synchronization. A discussion on Autoware issue 629 on GitHubis below.

Due to ROS specification, if messages over 1GB is published, the error you mentioned occurs. ndt_mapping node publishes all the point cloud previously built, so if the map becomes larger up to a certain size, the message can not be published. If you want to create wide area map, you can use approximate_ndt_mapping instead of ndt_mapping. The node loads and publishes a part of the map, so the error above does not occur.

Now, I wanted to try running Autoware on a more powerful machine, like an instance I could spin up using the Amazon Web Services (AWS) Elastic Compute Cloud (EC2). I could easily provision a p2.xlarge Tesla K80 instance with 12GB GPU memory, or even a p3.2xlarge Volta V100 instance with 16GB GPU memory. This would let me create the full-scale maps easily.

I used US AWS instances in Virginia and Oregon which are relatively faster than AWS China. AWS has Deep Learning AMIs which already have NVIDIA drivers and CUDA installed out of the box. Initially I tried to use the latest AWS Deep Learning AMI (23.0) but it has a virtualenv set up even for the main user, creating problems with ROS. So I switched to an older version (18.0) and then it was relatively smooth. The set of steps I followed is below:

1. Install ROS Kinetic for Ubuntu 16.04 and set up the catkin_ws. Thanks to great internet speeds in US AWS instances, this takes about 2 minutes.
2. Git clone Autoware and build it with CUDA and GPU support. I used a slightly older version (1.10) because I had some issues with the latest one (1.12).
3. Once complete, run the command:
source ~/Autoware/ros/devel/setup.bash
4. For launching ndt_mapping, you need to initialize the tf values for x, y, z and roll, pitch, yaw. So SSH into the instance in multiple terminals, launch roscore, then run the following command for each of the parameters:
rosparam set tf_x:=0.0
5. Run the ndt_mapping node with the following command. I obtained this by looking at the output when clicking this checkbox in the Autoware GUI. The method_type 2 is going to run it on the GPU.
roslaunch lidar_localizer ndt_mapping.launch method_type:=2 incremental_voxel_update:=False use_odom:=False use_imu:=False imu_upside_down:=False imu_topic:=/imu_raw
6. Play the ROSbag which has the data for your NDT Mapping. Make sure that it plays the point clouds on the topic /points_raw, or else it will not be used by the NDT Mapping node at all.
7. Finally, when the mapping is complete, run the following command to save the final output to a PCD file, just like in the Autoware GUI. I learned this from Autoware issue 1207 on GitHub.

Now while the above steps should ideally work on AWS, there were a few issues I faced.

Autoware fails to compile with AWS p2.xlarge Tesla K80 instances. The architecture is the sm_35 or sm_37 architecture, and while Autoware supports all architectures, I was getting multiple CUDA errors when trying to build Autoware and was unable to set the NVCC flag to correct the architecture based compilation. It didn’t help that p2.xlarge instances have only 4-core CPUs, making building a 25–30 minute task. So I switched to the p3.2xlarge Volta V100 instances with 8-core CPUs.
I wasn’t able to get the full scale NDT Mapping output beyond 120 messages. It would just abruptly stop doing the mapping even though the ROSbag continued to play. Very weird.
The last command failed to run because autoware_msgs was not compiled for some reason. A similar issue is described here but the solution is specific to his case.

So, because of all these problems, I tried to use approximate NDT mapping. Now, this doesn’t really require the cloud processing power at all, because of its very nature. It can run even on the GTX 1080 without breaking a sweat. But I was able to run it successfully on the cloud, and used the sub-maps to obtain a quick visualization in RViz.

Now, the output isn’t perfect, but I was quite sure that it could be improved with more careful data capture. So I went out again, drove the Honda Civic all around the industrial park, and started the same procedure, but on the in-car computer with the GTX 1080 and approximate NDT mapping. The result was something like this:

The output above is also incomplete, only 50% of the final map. Result can be improved by using IMUs, multiple LIDARs and maybe even GPS.

The original post can be found here, published by Rohan Rao on June 16th, 2019. A brief introduction of the author:

Rohan Rao: Electrical Engineering Student at Indian Institute of Technology (IIT), Madras India and has recently been admitted to the Masters in Robotic Systems Development (MRSD) program in Carnegie Mellon University (CMU). Skilled in perception and localization of autonomous driving system. Experiences in projects related to hardware development and electronics, proficiency in ADAS, V2X, system-on-chip, as well as patenting wireless communication inventions and more. Rohan Rao is currently doing an internship at PIX Moving.

PIX Engineers will keep contributing and update more outputs regularly. Follow us and stay tuned !

Running Autoware-based Mapping in the Cloud

Written by PIX Team