Implement SLAM from scratch

Luis Bermudez
machinevision
Published in
10 min readApr 23, 2024

There are many ways to implement a solution for SLAM (Simultaneous Localization and Mapping), but the simplest algorithm to implement is Graph SLAM.

This is the trajectory of the robot over three time steps. Nearby, there’s a landmark (e.g., tree).

Graph SLAM is a technique used in robotics to simultaneously estimate a robot’s trajectory over time and estimate the positions of landmarks in the environment by representing them as nodes and constraints in a graph. The graph consists of nodes representing robot poses at different time steps and landmark positions, and edges representing the constraints between them, such as initial location, relative motion, and relative measurement constraints. By optimizing the graph, Graph SLAM aims to find the most likely trajectory and landmark positions that best explain the sensor measurements.

Example

Before we dive into Graph SLAM, I’d like to illustrate an example that will help us with the discussion as we continue to explore how to implement Graph SLAM. In this example, there is a robot and the robot is traversing in a one dimensional world. The robot’s first pose is at time step t0, where the robot pose is x=2. From this pose, the robot sees a landmark L0 (e.g., tree) and it’s 9 units away. Then, the robot moves forward by 5 units. At this point the robot should be at x=7 and the landmark should be 4 units away; however, the robot does not see/measure the distance to the landmark at time step t1. The lack of landmark measurement at time step t1 could be due to sensor error, occlusions, or something else. Finally, the robot moves 3 units forward and sees the landmark 1 unit away. At this point, the robot should be at x=10 and the landmark should be at x=11.

This is a graph representation of robot poses and landmarks in a one dimensional space. First, the robot sees a landmark 9 units away. Second, the robot moves forward 5 units. Finally, the robot moves 3 units forward and sees the same landmark 1 unit away.

Constraints

In Graph SLAM, there are three important types of constraints. Let’s dive deeper into each of these constraints:

  1. Initial Location Constraint: The initial location constraint represents our knowledge about the robot’s initial position in the environment. It provides a fixed reference point for the robot’s trajectory estimation. This constraint can be represented as <x,y,θ> to capture the initial position and orientation estimate. By incorporating this constraint into the graph, we can anchor the trajectory estimation and provide a starting point for further constraint optimization.
  2. Relative Motion Constraints: Relative motion constraints capture the information about how the robot’s pose changes between consecutive time steps. These constraints are typically obtained from odometry sensors, such as wheel encoders or IMUs. Odometry sensors provide estimates of the robot’s motion, such as the change in position and orientation. By comparing the odometry measurements between consecutive time steps, we can derive relative motion constraints that describe the robot’s movement. These constraints are typically represented as Gaussian distributions that capture the uncertainty in the motion estimates.
  3. Relative Measurement Constraints: Relative measurement constraints capture the information about the relative positions or distances between different landmarks or features in the environment. These constraints are obtained from sensor measurements, such as laser range finders or cameras. For example, if the robot observes a landmark and measures the distance to the landmark, this information can be used as a relative measurement constraint. These constraints help in estimating the positions of landmarks relative to the robot’s trajectory.

In Graph SLAM, all of these constraints are used together to build a graph representation of the environment and the robot’s trajectory. The graph consists of nodes representing robot poses at different time steps and landmark positions, and edges representing the constraints between them.

This is the pure graph representation. The relative motion constraints are defined by the t0-t1 and t1-t2 edges. The relative measurement constraints are defined by the t0-L0 and t2-L0 edges.

In our example, we have 5 total constraints: 1 initial location constraint, 2 relative motion constraints, and 2 relative measurement constraints. The 4 relative constraints are represented by the edges in the graph.

Graph Representation

In Graph SLAM, the environment and the robot’s trajectory are represented using a graph structure. The graph consists of nodes and edges, where nodes represent the robot’s poses (positions and orientations) at different points in time, and edges represent the constraints or measurements between these poses.

In addition to representing the robot’s poses, the graph also includes nodes that represent the landmarks or features in the environment. These landmarks can be objects, points of interest, or any other distinctive features that the robot can perceive and use for localization and mapping.

The graph representation connects the robot’s poses and landmarks through edges, which represent the measurements or constraints obtained from sensors. These measurements can include range measurements, bearing measurements, or any other type of sensor data that provides information about the relative positions and orientations of the robot and the landmarks.

For example, let’s say the robot observes a landmark from its current pose. This observation creates an edge between the robot’s pose node and the landmark node in the graph. The edge represents the measurement obtained from the sensor, which provides information about the relative position and orientation of the robot with respect to the landmark.

By incorporating these measurements into the graph, the SLAM algorithm can estimate the most likely trajectory of the robot and the map of the environment that best satisfies the constraints imposed by the measurements. The graph optimization process involves adjusting the poses and landmark positions in the graph to minimize the errors between the predicted measurements and the actual measurements obtained from the sensors.

Matrix and Vector Representation

In Graph SLAM, we use matrix and vector representations to model the relationships between robot poses and landmarks in a map. These representations help us solve the SLAM problem.

Let’s start with the matrix representation. In Graph SLAM, we create a matrix called the information matrix, denoted as Ω (Omega). This matrix represents the constraints or relationships between different variables in the SLAM problem. Each variable corresponds to a robot pose or a landmark in the map.

The information matrix is a square matrix, and its size depends on the number of variables in the SLAM problem. A Graph SLAM problem with n nodes in the graph, would have an n x n information matrix. The elements of this matrix encode the information about the relationships between variables. For example, if two variables are highly correlated, the corresponding elements in the information matrix will have higher values.

Now, let’s move on to the vector representation. In Graph SLAM, we also create a vector called the information vector, denoted as ξ (Xi). This vector represents the measurements or observations we have made in the SLAM problem. Each element of the vector corresponds to a specific measurement or observation. A Graph SLAM problem with n nodes in the graph, would have an n x 1 information vector.

The information vector contains the information about the measurements and their relationships with the variables in the SLAM problem. It helps us incorporate the measurements into the SLAM problem and update our estimates of the robot poses and landmarks.

In our example, we would initialize a 4 x 4 matrix, since there are 4 nodes in the graph. This is the information matrix:

// 4x4 matrix filled with zeroes
+ -- + -- + -- + -- +
| t0 | t1 | t2 | L0 |
+ -- + -- + -- + -- + - -+
| t0 | 0 | 0 | 0 | 0 |
| t1 | 0 | 0 | 0 | 0 |
| t2 | 0 | 0 | 0 | 0 |
| L0 | 0 | 0 | 0 | 0 |

In our example, we would also initialize a 4 x 1 vector, since there are 4 nodes in the graph. This is the information vector:

// 4x1 vector filled with zeroes
+---+
| 0 |
| 0 |
| 0 |
| 0 |
+---+

Graph SLAM Algorithm

Once we declared our information matrix and vector, then we need to apply the initial location constraint to the matrix and vector. For example, to update the information matrix with an initial location of 2, we would create a simple linear equation:

t0 = 2

Now, let’s take our simple linear equation and its coefficients <1,0,0,0;2>, and add it the row corresponding to t0:

// Omega matrix (result)
+ -- + -- + -- + -- +
| t0 | t1 | t2 | L0 |
+ -- + -- + -- + -- + - -+
| t0 | 1 | 0 | 0 | 0 |
| t1 | 0 | 0 | 0 | 0 |
| t2 | 0 | 0 | 0 | 0 |
| L0 | 0 | 0 | 0 | 0 |
// Xi vector (result)
+---+
| 2 |
| 0 |
| 0 |
| 0 |
+---+

To generalize this, here’s some initial pseudo-code:

void GraphSLAM(G, startPose) {
// Declare Omega and Xi
Omega = new Matrix(n,n)
Xi = new Vector(n)

// Initial Location Constraint
Omega['t0','t0'] = 1
Xi['t0'] = startPose

// Graph Optimization
Mu = GraphOptimization(Omega, Xi, G)
return Mu
}

From here, we need to discuss Graph Optimization.

Graph Optimization

Once we have have the initial matrix and vector representations of the graph, we need to perform graph optimization. Graph optimization in Graph SLAM is the process of refining the estimates of robot poses and landmark positions by iteratively updating the graph based on sensor measurements. It involves two main steps: the measurement update and the state update.

  1. Measurement Update: In the measurement update step, we iterate over the edges of the graph and add constraints into the information matrix. These constraints represent the measurements obtained from sensors, such as range measurements or bearing measurements.
  2. State Update: In the state update step, we solve a system of linear equations to estimate the optimal robot poses and landmark positions that minimize the error in the graph. This is done by taking the inverse of the information matrix and multiplying it by the information vector.

Here is some high-level pseudo-code for Graph Optimization:

void GraphOptimization(Omega, Xi, G) {
Omega, Xi = MeasurementUpdate(Omega, Xi, G);
Mu = StateUpdate(Omega, Xi);
return Mu;
}

Let’s go into more detail for the measurement and state updates.

Measurement Update

In the Measurement Update, it uses the Graph data to define the information matrix and vector data. Omega is the information matrix that represents the coefficients of the linear equations, Xi is the information vector that represents the constant terms in those equations, and G is the graph that contains the edge weights representing the measurements (e.g., distance).

Each of the edges helps to define a pair of linear equations. For example, to update the information matrix with the edge t0-t1, we would create two linear equations:

Lin Eqn 1: t0 - t1 = -5

Lin Eqn 2: t1 - t0 = 5

Now, let’s take the first linear equation and its coefficients <1,-1,0,0;-5>, and add it the row corresponding to t0:

// Omega matrix (result)
+ -- + -- + -- + -- +
| t0 | t1 | t2 | L0 |
+ -- + -- + -- + -- + - -+
| t0 | 2 |-1 | 0 | 0 |
| t1 | 0 | 0 | 0 | 0 |
| t2 | 0 | 0 | 0 | 0 |
| L0 | 0 | 0 | 0 | 0 |
// Xi vector (result)
+---+
|-3 |
| 0 |
| 0 |
| 0 |
+---+

Now let’s take the coefficients for the second linear equation <-1,1,0,0;5> and add it to the row corresponding to t1:

// Omega matrix (result)
+ -- + -- + -- + -- +
| t0 | t1 | t2 | L0 |
+ -- + -- + -- + -- + - -+
| t0 | 2 |-1 | 0 | 0 |
| t1 |-1 | 1 | 0 | 0 |
| t2 | 0 | 0 | 0 | 0 |
| L0 | 0 | 0 | 0 | 0 |
// Xi vector (result)
+---+
|-3 |
| 5 |
| 0 |
| 0 |
+---+

To see how this generalizes, let’s show you the pseudo-code:

void MeasurementUpdate(Omega, Xi, G) {
for each edge:
// LinEq 1: src - dst = -weight
Omega[edge.src, edge.src] += 1
Omega[edge.src, edge.dst] += -1
Xi[edge.src] += -edge.weight
// LinEq 2: dst - src = weight
Omega[edge.dst, edge.dst] += 1
Omega[edge.dst, edge.src] += -1
Xi[edge.dst] += edge.weight
return Omega, Xi
}

As you can see, the Measurement Update process involves iterating over each edge in the graph G. For each edge, the code performs two steps. First, it updates the Omega and Xi values for the row corresponding to the source node of the edge. Second, it updates the Omega and Xi values for the row corresponding to the destination node of the edge. In both steps, the edge weight represents the measurement between both nodes (e.g., distance).

These steps ensure that the constraints imposed by the measurements are correctly represented in the Omega and Xi matrices. After iterating over all the edges, the function returns the updated Omega and Xi matrices.

State Update

At this point, we have fully defined Omega and Xi. So we just need to solve a system of equations to solve for Mu:

Omega * Mu = Xi,

where Mu represents the updated state estimate. To solve for Mu, we merely need to invert Omega:

Mu = Omega^{-1} * Xi

void StateUpdate(Omega, Xi) {
Mu = Omega.invert() * Xi
return Mu
}

The provided pseudo code performs the state update by multiplying the inverse of the covariance matrix (Omega) with the measurement vector (Xi). The result, Mu, represents the state estimate of the robot’s pose and landmark positions. It is a vector that contains the values of all the variables that define the state of the system. In our example, this is the expected value of Mu:

// Mu vector (result)
+----+
| 2 |
| 7 |
| 10 |
| 11 |
+----+

In this example, the first element is the location of the robot at t0, the second element is the location of the robot at t1, the third element is the location of the robot at t2, and the fourth element is the location of the landmark (L0).

Discussion

Note that the Graph SLAM algorithm might not give you exactly the correct answer, but it will be close to it. The result of the SLAM algorithm depends on the quality of measurements that you feed the algorithm.

Also note that this was just a simple one-dimensional example. This can be easily extended into 3-dimensional space and also include additional directional dimensions (to explain which direction the robot is looking at).

If any SLAM jargon did not make sense, please review my Overview of SLAM.

If you enjoyed this article, please to help others find it!

--

--