The Art of Robotic Mapping: Localization, Occupancy Maps, and Navigation

Bhajneet Singh Bedi
Thoughtworks: e4r™ Tech Blogs
6 min readFeb 21, 2024

This blog is continuation to the main blog (Robots at Work: A story of Decentralized Collaboration), which you can read here.

Photo by Martin Wilner on Unsplash

by Bhajneet Singh Bedi, Divye Singh and Antariksh Ray

Assume that you are put in a house blindfolded and now you need to find your way out. All you can do is talk to your friend and ask for directions. Well it would be stupid to say that you won’t bump into objects around you because it’s your friend who is giving directions.

Now this is a well known interest area in the world of robotics, where your robot needs to move in the environment but without bumping and damaging itself. The solution to that is a combination of concepts/techniques like localization and occupancy maps. We will be discussing more about these concepts in detail in this article.

Let’s imagine a situation in which your robot is put into an arena in which there are small colored boxes which need to be picked and placed in a warehouse. Our robot is blindfolded, our friend here is a top-head camera which will tell the bots their coordinates and help them navigate.

What is Localization and Occupancy Map?

Localization refers to the process of determining the precise position and orientation of a robot or object within its environment. It involves integrating data from sensors such as GPS, LiDAR, cameras, and inertial measurement units (IMUs) to estimate the robot’s position relative to its surroundings.

Occupancy maps serve as a static representation of the environment and are typically generated from sensor data such as laser scans, depth images, or point clouds. By discretizing the environment into occupied and free cells, occupancy maps enable robots to perceive and reason about obstacles in their surroundings.

Well for better understanding of these concepts let’s dive into the project and explore them step by step.

Step 1: Capturing the Arena

The initial step involves capturing the arena, achieved through the utilization of a top-head camera within the Ignition Gazebo simulation environment. Ignition Gazebo serves as a physics simulation software, enabling the replication of real-life scenarios. It facilitates straightforward access to a variety of sensors via plugins and APIs, allowing for publication of data at the required frame rate. The camera frames are disseminated through the topic /top_camera/image_raw.

Employing a top-head camera as part of the global localization approach offers a cost-effective and dependable alternative to an onboard stereo/depth camera. The unprocessed image frames captured by the camera necessitate subsequent processing to derive the coordinates of all entities within the arena, including boxes, robots, walls, and warehouses.

Step 2: Processing Frames

We developed a Python node tasked with subscribing to the image_raw topic to acquire image frames. In order to focus solely on the arena, we utilized cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) to identify contours, subsequently isolating the arena by removing surrounding areas.

Subsequently, the objective is to identify and categorize entities within the arena. Contours of the cropped image are re-evaluated, and entities such as warehouses, walls, boxes, and bots are classified based on their contour areas. Position coordinates are determined using the cv2.moments() function, which computes the spatial arrangement of all contour coordinates. Utilizing the computed moments, the center coordinates are obtained through the formulas:

center_x = int(M['m10']/M['m00'])
center_y = int(M['m01']/M['m00'])

Here, M[‘m10’] and M[‘m01’] represent the first-order moments, while M[‘m00’] signifies the total blob area. These center coordinates are subsequently published to a map or /json topic in JSON format, with each entity and its respective colors binned into specific keys.

Having acquired information about the entities within the arena, the subsequent step involves localization. Each bot must ascertain its position relative to the positions of other bots from the JSON data.

RVIZ visualization of arena

LOCALIZATION

Localization is the process of an entity knowing its position in its environment. There are two types of localization:

  1. Local:- Local localization requires the robot’s initial position to be approximately known. This method uses on-board sensors to learn about its surroundings and localize. Example: On-board camera (stereo/mono /depth-camera/ RGBD camera), LiDAR, IMU, encoders.
  2. Global:- Regarding global techniques, they possess the capability to determine a robot’s location even in the absence of prior positional information. Global methods surpass local ones in power and are adept at managing scenarios where a robot might face substantial positioning inaccuracies. The process of globally localizing a robot involves the utilization of tools like GPS, beacons, radars, or off-board cameras.

We choose global localization as it is much more robust and more reliable when there is a loss of data on the robots’ position. For localizing, a bot is rotated in a direction for 2 seconds, and its initial and final positions are noted. Now each bot checks the change in its position; if a bot finds a difference in its position then that coordinate of the bot is removed from the /map/json topic, and filtered/map/json is created for each bot with its namespace. Example: bot-1 rotates, so bot-1 coordinates are removed from JSON data, and topic /filtered/map/json/bot-1 is created for further processes.

The same can be seen in the GIF below which shows 3 robots in an arena 3 separate RViz for them. Each RViz shows a costmap of each robot which will be explained later.

Bots localizing in the arena

After localization, all the bots continuously update their current position by selecting the nearest one from the list of positions in JSON data.

OCCUPANCY MAP

Given the coordinates of the bots and boxes, we need to represent the arena in a format that can be processed by the navigation stack. An occupancy map is a representation of the environment that indicates a cell being occupied or free on a map. The map also helps to visualize the arena, get feedback, and tune the navigation algorithms. We plotted pixels ourselves, approximating the shapes of the entities in the arena.

PLOTTING THE OCCUPANCY MAP

We create an OpenCv image, representing pixels of an image where each pixel has an intensity of 0 (unoccupied) or 100 (occupied). Now, for making the occupancy map, the coordinates of all the entities are used to make a map. A circle is used to represent a robot, and a square is used to represent a box. The dimensions are matched with the dimensions in the gazebo simulator.

Occupancy map representation in RVIZ

This matrix is converted to vector<int8_t> to match with nav_msgs::msg::OccupancyGrid message type and then published to /map topic.

The occupancy grid is updated at a fast frequency to account for the moving entities in the arena which helps the navigation stack avoid obstacles.

After the occupancy map is created, the next step is navigating between the arenas, while avoiding obstacles and moving robots. That part can be done using the NAV2 stack.

CONCLUSION

The process to identify the bots happens once, following which the coordinates keep updating based on the nearest updated coordinate. In short, combined localization and occupancy maps become fundamentals for robot navigation, which can be fused with other sensors like LiDARs and stereo cameras for better decision-making and autonomy. Fine-tuning our localization algorithm and occupancy map, removing noise, and accounting for other obstacles can increase the robot’s accuracy. The robot will be able to handle tougher conflicting situations and plan paths through even tighter spaces.

Disclaimer: The statements and opinions expressed in this blog are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.

--

--