Using SLAM to Autonomously Navigate in Unknown Environments

15 min readOct 15, 2023

Robots are becoming an increasingly integral part of our lives: household cleaning robots, autonomous vehicles, drones, or vehicles that transport food, all these are made to do tasks that allow humans to focus on more important ones. Engineers are developing increasingly complex autonomous systems to be used in scenarios that humans are unable to: rescuing from a burning building or one that has fallen due to an earthquake, exploring other planets in order to determine if humans can live there, and much more. All these robots can move around and perform tasks on their own, without any human guidance. But, how do they do that?

Autonomous Navigation

Let’s think about how humans perform tasks in an unknown environment. When you go into a new, big place you often want a map, right? Or you just wander around trying to find where you want to go, but that’s inefficient and you will probably get lost. Let’s consider the first scenario. You then want to know where you are. If the map doesn’t tell you that, you need to find ques; some distinctive characteristics of the place that reveal where you are on that map.

Humans struggle to do that. Moreover, robots don’t even have that map; they have to go around and build it on their own. And their sensory inputs are very limited compared to those of humans. So, I think you understand how difficult that task is for a robotics engineer or a computer scientist to solve.

A robot consists of various parts, primarily motors or sensors, integrated into a chassis. The robot only takes information about its surroundings from its sensors, and it needs to use that information to successfully complete the task it is given. To do that, it needs to understand two basic things:

Where it is at the moment, and what is its orientation (Localization)
What and where are the objects around it, and how do they affect its actions (Mapping)

Most of the time, the robot needs to identify that information on its own, as it is deployed in an unknown environment. And for that purpose, it needs to combine the information given by its sensors and determine its next move. It also needs to do that fast, even at the pace of 100–1000 times per second, and accurately, as every error in one step can accumulate to the next. This is where sophisticated computer algorithms come in.

Simultaneous Localization And Mapping (SLAM)

The desired solution to the above problem is SLAM: an algorithmic approach to enabling autonomous robots to both create a map of the surrounding area, using sensors, and use that map to navigate the area at the same time.

SLAM essentially implements the same process that humans do, using the robot’s sensors and processor. This is a “chicken-or-egg” problem: to determine its location it needs a map, and to determine the map we need the locations of the observed objects.

To really understand what this process is doing, we need to dive deeper into some technical aspects.

Sensors

The most common types of sensors used for SLAM are LIDAR (Light Detection And Ranging) or Cameras. Let’s have a look at the way each of these works:

LIDAR: A 360° moving head emits laser pulses that reflect off the surrounding objects and end up in a sensor. By measuring the intensity of the beam received, in each angle and elevation, we get the distance of the reflection point, therefore forming a 3D image of dots with the points that the beam reflected off.

Camera: There isn’t much to explain here, cameras are everywhere, from our smartphones to security systems, the robot gets a high-resolution image, which is an array of pixel RGB or grayscale values that range from 0 to 255 (usually).

Odometry

When a robot knows each time the exact speed and rotation angle of its motors, these data can be used to determine its position and orientation in the environment.

However, due to many factors such as sensor drift, wheels slipping, approximation errors, non-flat terrain, etc., this estimation is often inaccurate, therefore it needs to be used along with the other parts of the SLAM process, which I will describe below.

Landmarks

They are easily distinguishable features that can be used by the robot to understand where it is after having seen them for the first time. For example, it could be a tree, or some other familiar object. It could also be some unique edge or some texture. They should satisfy some criteria for them to be usable, which I list below.

Be re-observable. We want the robot to be able to recognize a landmark it has seen before, in order to re-locate itself. However, this will often be done from a different angle or location, so it needs to be able to detect it in that scenario.

Be unique. It should easily be distinguished by other similar landmarks at a later time, and not be confused with other ones, something which can cause large errors in the computation.

Many of them. We want to reduce the time the robot spends without seeing any landmarks. During that time, the robot only relies on odometry data, which are often inaccurate, and therefore the robot can get lost.

Be stationary. This is pretty straightforward, we need the robot to be able to use the landmark as a reference point that it knows where it is, and it cannot predict the movement of a non-stationary object like a human.

But, how do we actually extract these landmarks?

Landmark Extraction Algorithms

Depending on the sensor used (LIDAR or Camera), there are various algorithms for determining such distinctive features.

Let’s start with the LIDAR algorithms:

Spikes. We identify the extremas of the values given by the LIDAR, meaning those that are larger or smaller by both neighboring ones, therefore probably representing an object or an edge. This has the limitation of only working when there are distinctive extremas, meaning that it can’t work in smooth environments.

RASNAC (Random sample consensus). The goal of this algorithm is to detect lines. The way it does that is by taking a sample of readings in a range and computing a line of best fit through these readings (using a least squares approximation). Then, it measures how many readings are within a threshold distance of this line, and if they are more than a certain number (called the consensus) it registers it as a line.

Note: these features are usually represented as the range and bearing of a point, meaning the distance from that point and the angle, as measured by the robot.

For cameras, we use the following algorithms:

SIFT( Scale Invariant Feature Transform). It is an algorithm that detects specific features of an image and is independent of scale and rotation. It consists of finding keypoints in the image, in a way independent of the image size (scale invariance), and giving them an orientation, depending on the most distinctive keypoint around it. This way, the keypoints in different images can be matched with each other.

ORB (Oriented FAST and Rotated BRIEF). It consists of the combination of two algorithms, as described below:

FAST (Features from Accelerated and Segments Test): It detects scale-invariant keypoints, and computes an orientation based the intensity of neighboring cells.
BRIEF (Binary Robust Independent Elementary Feature): It basically converts the keypoints computed before to orientation-resistant binary vectors that together represent an object.

Data Association

When the robot wanders around its environment, it often detects a lot of landmarks, either ones that it had previously seen or new ones. However, in order to use these to locate itself, it needs to be able to match landmarks in two different moments at a time. However, there are some issues:

It might have previously seen a landmark but not recognize it every time it sees it.
It observes a landmark but never observes it again.
It falsely matches a new landmark with a previous one.

The first two issues are theoretically solved by the criterion we set above for the landmarks to be easily re-observable. We need to create an algorithm that solves the third one.

Nearest neighbor approach. We keep a database of landmarks that we have already observed, and we consider only the ones that are seen at least N times. Specifically, we perform the following steps:

Do a laser or camera scan and extract landmarks
Associate each landmark with the closest one in the database that is seen at least N times
Filter these pairs of associations using a validation gate, which I will describe later on
a. If it passes validation, it is a re-observed landmark so we increment the number of times we’ve seen it
b. If it fails then it is a new landmark, so we set its count to 1.

Validation gate. The implementation of SLAM also tells us how uncertain a landmark observation is, so we use that to check if a landmark’s uncertainty lies within a certain threshold and therefore is in the database (re-observed landmark)

Extended Kalman Filter

The heart of SLAM is the Extended Kalman Filter, which is an algorithm that combines multiple measurements of some variables, which contain a lot of noise and uncertainties, to get a more accurate prediction of its actual value.

You would ask: Why not just take the average of the values? Well, if we for example have a set of values like 5, 6, 4, 10, 5, the actual value is 5 but we would get an average of 6, because of 10 which is an outlier. Kalman filter gives more weight to the measurements that are more accurate and less weight to the inaccurate ones.

But how does it do that?

The algorithm contains matrices representing the system state (robot coordinates, orientation angle, and landmark coordinates), the prediction equations for the estimated next sensor measurements and robot’s state based on their current values, the relations between variables, their accuracy, and their noise/uncertainty.

Also, slight note, the Extended Kalman Filter differs from the normal one in the fact that it can handle prediction models which are non-linear, so we have some other matrices with first-order approximations of them.

It first initializes all the matrices to their initial values. Then, it repeatedly performs the following steps.

Update using odometry & predict. In this step, we use the control inputs given to the robot (thrust, steering, etc.) to predict the new position of the robot.

Update using re-observed landmarks. Since our predictions from the previous steps are probably inaccurate (odometry errors), we use the associated (re-observed) landmarks, which we compute as described above, to compute the deviation of the robot from its predicted position, and update that.

Add new landmarks to the state. We now add the new observed landmarks (not matched with existing ones) to the algorithm matrices and update them.

Loop Closure

When a robot revisits a previously mapped area, after a long exploration around the environment, the state becomes much more certain (e.g. the robot and landmark positions), and if there were ambiguities, they are probably resolved. It is similar to data association, but it updates the existing predictions much better and removes the uncertainties.

We’ve now explained the basic parts of how SLAM works to a great extent, but there is something that we haven’t talked about.

Why should I care about all this stuff?

Why is it important?

This algorithm makes things that were previously science fiction become a reality. For example, ask someone 20 years ago if they would like a robot to serve them in a restaurant or even their own home. They would probably call you crazy. Now are seeing applications of this technology everywhere. Robots can move around and perform tasks without any human input, not even being given the structure of their environment, as they learn it on their own.

Take Automated Vehicles for example. There is an enormous number of situations a car can get in, and it should use the information it has about its surroundings in order to act quickly and safely. Rescuing people from hard-to-reach locations. There are robots that can do that, and as technology advances they will become more broadly used. Industrial robots can now be deployed into a new environment, and just by giving them a demonstration of the required task, they can perform it accurately every time, in coordination with the humans and other moving objects that exist in the factory. SLAM is one of the algorithms that have revolutionized this field and applied it to so many important aspects of our lives.

Previously, we talked about the types of sensors that are used in autonomous robots. But, which one is better?

LIDAR & Visual SLAM

LIDAR and Cameras are two very powerful types of sensors, that have evolved over the years and have been improved in terms of accuracy, speed, and capabilities, with the advancement of computer algorithms and the Artificial Intelligence revolution. But they have their strong and weak points, which are different for each one.

LIDAR

It is usually more accurate in measuring distances, as it works with laser beams instead of pixels, and a camera doesn’t know the exact scale of the objects (it may be a bigger object far away instead of a small one next to it). It doesn’t depend on external light, as it emits its own, so lightning conditions (e.g. low lighting) are not a problem. It is also much less affected by environmental conditions such as dust or fog as the laser beams can pass through these.

On the other hand, LIDAR is much more expensive and complex in terms of both the sensor itself and the processing required to analyze the huge amount of data points it gives as an output. It also has a limited field of view (mostly vertically), which makes it unable to see above objects or around corners. Finally, a large problem when using LIDAR is reflective surfaces, which can cause the beam to never reach the sensor from them.

It is most commonly used in autonomous driving vehicles, industrial robots, and high-resolution mapping of areas.

Camera

Cameras are very common and much more advanced nowadays and have extremely high resolutions. They are much less expensive than LIDAR and broadly available, and there is less data to process. They also have a wider field of view, and therefore the ability to see features that a LIDAR could miss, while it also can detect visual features, like color or texture, and in a much higher detail than LIDAR (given a well-lit environment).

However, lighting conditions are a large problem when using cameras, as well as light from different sources and reflections, which can confuse it. They are also much less accurate while measuring distances, as well as in other aspects of SLAM.

Cameras are frequently used in indoor autonomous navigation, AR headsets, and other robotic applications with a lower need for safety than industrial ones.

Issues & Limitations

Well, what SLAM does is it predicts the location of specific features in an unknown environment using measurements of the LIDAR or camera and the motors themselves. There are a few problems with that.

First of all, it relies on approximations, both in the interpretation of the movement of the robot and the landscape position (when we talked about first-order approximations). There is also a lot of noise in the data, and many cases where the environment can change so much it becomes unrecognizable (even a human would be confused in an environment with 100 people). These factors reduce its accuracy, therefore making it unable to be used in cases where this is needed (some industrial environments with non-optimal conditions for autonomous robots). There are 3 specific aspects of this problem.

Sensors. There is no single type of sensor that is both accurate, efficient, and robust in all conditions, e.g. lighting, feature-less regions, reflective surfaces, indoor spaces, etc. We talked more about this above. However, sensors are becoming more accurate and robust to diverse conditions, and sophisticated ones are becoming more widely accessible because their price reduces over time.

Algorithms. The features that the robot stores are not an accurate representation of the environment, just some distinctive parts of it. This can cause it to miss some important details. Also, the usage of algorithms requires much processing power, causing limitations in reaction time and difficulty in embedding such a system on a moving vehicle. Another problem is their robustness in unknown scenarios, where safety is of utmost importance, but a moving human that is ignored by the robot can cause an accident. Finally, loop closure is very difficult and computationally expensive, while it can be tricked by similar places that the robot thinks are identical, destroying its predictions. However, as processors become more and more capable, especially regarding their AI applications, increasingly complex tasks are becoming feasible over time and the performance of SLAM increases.

Data. The data that are stored for use by the algorithm are huge. Imagine, as a human, having to remember every single detail of a building. That is what the robot does, and it requires much data, which consequently increases the processing power required for them, and the storage needed. There are also errors, such as noise and drift, which can cause an accumulation of bigger errors and a possibly false belief of where the robot is. On the other hand, as the accuracy of the sensors increases along with the capability of processors and storage devices, we are going to be able to handle bigger amounts of data with higher accuracy.

What does the future look like?

This fascinating technology has shown much of its capabilities until now, but the future is even more exciting.

The popularity of SLAM will grow even more with the emergence of indoor mobile trends in robotics, while also the technology will become much more available. Also, SLAM may offer an appealing alternative to user-built maps, allowing robots to operate without such a map.

Also, with the transfer of the processing from built-in devices to the cloud, the capability of such algorithms can become much bigger and allow the field to thrive in many more areas and complex environments. It will also enable them to map much bigger environments, even moving on from a single building to a city, and allow robots to perform tasks that have been requiring humans to do them, such as delivery (there are already some companies working on that right now).

Additionally, the exponential evolution of technologies such as AI and IoT, combined with the increased capabilities of networks, like 5G, can reveal a vast range of opportunities for interaction between different systems, even robots communicating with each other to complete a task collaboratively.

References

Burgard, W., Stachniss, C., Arras, K., & Bennewitz, M. (n.d.). SLAM: Simultaneous Localization and Mapping. http://ais.informatik.uni-freiburg.de/teaching/ss12/robotics/slides/12-slam.pdf

Content, S. (2021, March 23). Why field-of-view matters. The Robot Report. https://www.therobotreport.com/why-field-of-view-matters/

HOOKII. (n.d.). Lidar slam vs visual slam: Which is better? HOOKII. Retrieved October 8, 2023, from https://hookii.com/blogs/robot-lawn-mowers/laser-slam-vs-visual-slam-which-is-better

Lidar SLAM: The Ultimate Guide to Simultaneous Localization and Mapping. (n.d.). Retrieved October 8, 2023, from https://www.wevolver.com/article/lidar-slam-the-ultimate-guide-to-simultaneous-localization-and-mapping

Riisgaard, S., & Rufus Blas, M. (n.d.). SLAM for Dummies: A Tutorial Approach to Simultaneous Localization and Mapping. https://dspace.mit.edu/bitstream/handle/1721.1/119149/16-412j-spring-2005/contents/projects/1aslam_blas_repo.pdf

SA, F. (n.d.). What is simultaneous localization and mapping (Slam)? Retrieved October 8, 2023, from https://www.flyability.com/simultaneous-localization-and-mapping

Thomson, C. (n.d.). What is SLAM? (Simultaneous localisation and mapping). Retrieved October 8, 2023, from https://info.vercator.com/blog/what-is-slam

Tyagi, D. (2020a, April 7). Introduction to orb(Oriented fast and rotated brief). Data Breach. https://medium.com/data-breach/introduction-to-orb-oriented-fast-and-rotated-brief-4220e8ec40cf

Tyagi, D. (2020b, April 7). Introduction to sift(Scale invariant feature transform). Data Breach. https://medium.com/data-breach/introduction-to-sift-scale-invariant-feature-transform-65d7f3a72d40

Zanj, S. (2018, April 28). Extended kalman filter. Medium. https://medium.com/@siddheshzanj/extended-kalman-filter-94fe07fd5c79