“It’s autumn in New York…
Glittering crowds and shimmering clouds,
In canyons of steel,
They’re making me feel I’m home.”
— Vernon Duke, “Autumn in New York”
For any enthusiast of jazz from the golden days of yore, high-raise skyscrapers and jam-packed streets are among the best illustrations of the chaotic, ever-exciting urban life. However, for any mapping or localization researcher in the age of autonomous driving, it is an understatement to describe these urban icons as nightmares. Reasons are simple: cities are just not friendly to localization. Positioning satellites are blocked, camera/LIDARs are occluded, and even inertia sensors are drifting over time! What else are left for our poor researchers to use?
Luckily, there is still hope: the data. Recently, learning based approaches are helping to solve the mapping/localization problem. Apple is trying to learn corrections of GPS signals from their driving data, and our latest object detection algorithms are eliminating dynamic objects in urban scenes.
To fix this missing link between urban localization and algorithm development, we are sharing with the community our UrbanLoco Dataset. Jointly working with the Intelligent Positioning and Navigation Lab from Hong Kong Polytechnic University, we (the MSC Lab at UC Berkeley) collected 14 kilometers of urban driving data in San Francisco and Hong Kong with a full sensor suite: single point GNSS, cameras, LIDAR, and IMU. We planned to launch our UrbanLoco dataset in ICRA2020, but the conference is sadly cancelled due to the COVID-19 pandemic around the world. However, we would like to share this valuable dataset with you here on Medium. You can easily access the UrbanLoco dataset from our website and arxiv.
What’s in the dataset?
A picture is worth a thousand words, and here’s lots of them. In UrbanLoco, you would find images, LIDAR point clouds, GNSS solutions, and intertial information we collected on some of the busiest streets in San Francisco and Hong Kong.
We toured around some of the busiest blocks in San Francisco: Market Street (before its closure to private vehicles), China Town, and Union Square. We also drove through some routes that are even challenging for human drivers: the sinuous Lombard Street, and the hilly Coit Tower roads. More interestingly, we traveled through some Bay Area landmarks: the Golden Gate Bridge and the Bay Bridge. On the other side of the Pacific Ocean, our Hong Kong team collected data in Hung Hom around their beautiful campus.
For every aforementioned city sections in San Francisco, we gathered 360-degree images, point clouts, GNSS coarse localization solutions, and inertial sensor readings. To protect our community’s privacy, we have blurred pedestrians’ faces and vehicle registration plates with anonymizer, an open-source detection and blur algorithm. In Hong Kong’s dataset, instead of horizontal cameras, we have the fish-eye camera pointing towards the sky for GPS related researches such as the multipath problem.
How’s the data collected?
Since early 2019, we started to construct a mobile data collection platform based on a Toyota Prius (indeed very Californian!). Here is a picture of our car and an illustration of the sensor configuration.
We equipped our Prius with the following sensors:
- One 3D LiDAR (RS-R32, Robosense, 360 degrees HFOV, +15~-25 VFOV, 200m range, 10Hz)
- Six Fixed-focus Cameras (FLIR Grasshopper 3, 70 degrees HFOV, 10 Hz, synchronized)
- One Inertia Measurement Unit (Xsens MTI 10, 100 Hz)
- One Single Point GNSS Receiver (GPS/GLONASS, 10Hz)
- One GNSS Inertial Navigation System (NovAtel SPAN CPT 7, RTK, 10 Hz, GPS/GLONASS)
We have carefully calibrated the intrinsic matrices of the camera with Kalibr, an open-source camera calibration tool box from ETH. The extrinsic between the LIDAR and all six cameras was calibrated by Autoware calibration toolkit. We used a sequential software trigger to trigger six cameras along with the rotation of the LIDAR and calibrated the time delay before data collection. The NovAtel SPAN CPT 7 sensor provides the ground truth for our dataset, and its RTK correction signal is acquired from the California Real Time Network (CRTN) and the Bay Area Regional Deformation Network (BARD).
Why are urban scenes challenging, and how challenging is UrbanLoco?
Urban mapping is hard due to its special geographic features and demographic properties. The high-rise structures in a city can easily block/reflect the signals from satellites to receivers. Commonly known as the multipath problem, these blocking/reflection over building facades would add time delays to the GNSS signal, resulting in a noisy pose estimation. For visual/laser-based solutions, a strong assumption under the hood is that the scenes is static (or mostly static), which is definitely not the case in urban scenes. When I was checking our LIDAR data on the Market Street, I noticed that in some frames more than 50% of the LIDAR points are scattered on dynamic objects. While IMUs are immune from moving objects and tall buildings, they suffer from drift over time, since error in each estimation would accumulate. When localizing purely with a reasonably priced IMU, you would find yourself resulting in nowhere after a few minutes.
With so many challenges in urban scenes, however, you would rarely encounter a good dataset addressing all these difficulties. The popular KITTI and RobotCar dataset were both collected in rural areas, and the more recent NuScenes, ArgoVerse, Waymo, and Lyft datasets were not providing enough sensor information for either mapping- or localization-related tasks.
To see how challenging San Francisco/Hong Kong is for urban localization, we implemented a few mapping/localization algorithms on our dataset: the Visual-Interial Navigation System (VINS), the HDL-SLAM, and the LIDAR Odometry and Mapping (LOAM). Shown here are a few plots of their performances on our collected trajectories, and you could clearly see the huge deviation from our ground-truth data.
Quantitatively, we found that the horizontal localization result could deviate more than 150 meters, and the average yaw angle could deviate up to 20 degrees. More interestingly, we found that none of the aforementioned algorithms performed well in the height estimation. If you are interested in a more detailed analysis of each scene, please check our published paper.
It was never easy to build a dataset from scratch: all the way from purchasing the vehicle to constructing the website. However, we believe that such data is truly valuable for enthusiasts, students, and researchers in the mapping/localization community. If your company is interested in using our data for your business, you can contact me through LinkdeIn.
Lastly, we hereby thank Robosense for their generous donation of the Robosense R32 LIDAR, and we appreciate all the supports from our professors, colleagues, friends along the way. During this epidemic/economic hardship, let’s all pray together: may the endless night is coming to an end, and the driverless car is indeed without a driver.