Making Self-Driving “Smarter” — Part III: SLAM baseline

Love Robots to Death
6 min readApr 9, 2023

--

So finally I have finished preparations for my future experiments with SLAM algorithms.
I have build a baseline SLAM so I will be able to do two things:

  • To compare algorithms;
  • To get a training / testing data;

Not all data I am using would be used in SLAM, but all data would be used for gathering dataset. So let’s see what I have here.

General stack description.

Let me remind you this story, where I have experimented with different SLAM algorithms in the gazebo environment.

I chose gmapping because it is both fast enough and precise enough. Gmapping mostly uses lidar data for building a map. This module should know the base frame, odometry frame and also frame for map output.

Frames are not the same as the rostopics. Usually tf frames are supposed as “frames”. I will talk about it later in more details, but if you have no idea, what is tf and frame, I recommend you to visit this tutorials. But basically, if you have several devices using the orientation and the coordinate frame somehow, you should connect them in one coordinate system — so your SLAM would work.

I have installed the lidar on my robot’s platform, I talked about it here. So I did not have problems with a lidar data and setting the lidar’s frame in the base frame.

But to get an odometry was a tricky task. I have an installed IMU sensor and also I have a realsense cameras (T265 and D345). The T265 has its own IMU sensor and also has odometry publishers: both visual and IMU.

First, I planned to get an odometry from IMU sensor, and even compared the IMU from realsense and my old one. But I found out that although the gyro shows the turnings quite well, the accelerometer is a very bad device for analyzing the movement because of very noise data. There was a high noise both in the my old IMU and realsense’s IMU. So we cannot be sure in IMU-based odometry.

Then I found out that T265 also has a visual odometry and it works quite well. You can visualize the v-odometry in rviz and check it yourself.

Image 1. T265 odometry visualisation

On the Image 1 you can see my standard motor test (move forward, move backward, turn left and right).

On the Image 2 you can see the result of an odometry’s changing while I am controlling the robot’s moving from the keyboard. I just moved forward and backward.

As I have a rough floor, my robot’s moving is not smooth, there are many additional rolls. You can see them on visualization.

Image 2: Odometry visualisation.

Realsense data.

So I have two realsense devices on my robot. Here is the data description you can get from it.

T265 has a two fish-eyes (see Image 1), visual odometry and IMU odometry / IMU data. D435 has a color image (from 2 cameras) and a depthmap (see Image 2).

The part of a SLAM stack is going to be a visual odometry and depth map and color images are going to be gathered for the test dataset.

If we get through this documentation, the algorithm of getting v-odometry has three steps:

  • Recognizing key-point features. There are many kinds of features and features mix. But as an example what key point is, you can go through this SIFT tutorial from OpenCV;
  • Getting 3d image and coordinates from two cameras, in case T265 — two fisheyes. It is a quite usual task for stereo vision;
  • Matching key-points in the 3D space and getting the changes in odometry on its base.

During my tests I noticed that v-odometry is more precise, than the IMU odometry. But it has disadvantages, let us take a look on comparising table from the realsense documentation.

Image 3: Comparising IMU and V-odometry, source: https://dev.intelrealsense.com/docs/intel-realsensetm-visual-slam-and-the-t265-tracking-camera

V-odometry is quite consumptive for CPU, it is scene dependent. The scene dependency is not a very big problem for indoor robots. Unlikely your indoor scene is changed unexpectedly. But it is stable over time, that is also good.

Another big disadvantage that is not mentioned above — the v-odometry may not work with the high speed or during speed-up. I am not sure what is the reason for that, but I suppose that either the wire may be unplugged or the v-odometry algorithm is not able to be done so fast. So v-odometry is not for high speed or speed-up.

The other reason why I am struggle to use v-odometry — is that my further aim to get odometry changes from different visual devices by using neural networks. For example, I mentioned above that in T265 SDK is using feature key-point recognition. Such approaches are easily replaced by CNN. Also the lidar data can be pased to CNN layers as well and then these layers may be fused or stacked, and finally as the output would be changing in odometry.

So it is a great baseline for my further experiments. And I suppose it should be more precise and more fast. Especially when we have a possibility to optimize CNN-based architectures for even CPU usage.

TF tree and how to build it.

Every device is used for the SLAM task has its own coordinate frame. For example, on Image 4 you can see coordinate frame for devices built-in T265.

Image 4. Coordinate frame for T265, source: https://dev.intelrealsense.com/docs/intel-realsensetm-visual-slam-and-the-t265-tracking-camera

If we are using in our SLAM stack the v-odometry and the IMU odometry, we should let know to gmapping how these frames are connected.

If devices are located on fix frames — so these parts are not supposed to move by their own — ros tf_static_transform publisher can be used. You need only set the x, y, z indent and roll, pitch, yaw values which one frame is located from another. So more scientific — to create transform matrix from one frame to another.

Also for SLAM tasks you also should have a “baseframe” or “basefootprint”. Usually it is the robot’s coordinate center.

In the end you should get the tf_tree like that:

Image 5: robot’s tf-tree

Here I have a base_footprint which connected with the t265_pose_frame (realsense camera T265) and lidar. And t265_pose_frame is also connected with other camera’s frames: fisheyes and frames of D435.

Gmapping.

If you have built your tf-tree correctly and also all your devices are available, the last thing you should do — to launch gmapping with arguments where you set odom frame, map frame and base frame.

So here is it the result I got:

Image 6: the built map

These map are built during the serias of movements, as you can see on the rviz screenshot.

Next steps.

The first aim is to stabilize this approach, because sometimes I have problems with plugging-in devices, with power and also I have not debugged my pwm control yet good enough to easily use it.

The second aim is to gather test or validation data to try some approaches with CNN as I mentioned in paragraph about the realsense data. I am not going to train something from scratch, first I am going to look through pretrained models and to finetune them.

The third aim is to gather more data or think how I can get a training data.

That’s all for today, I hope it was interesting for you. See you in the next chapters!

--

--

Love Robots to Death

Hi. My name is Olesya Krindach and I am software engineer and data scientist with background in deep learning both CV and voice.