Posted underAlgorithmsEducation

What is SLAM?

Simultaneous Localization and Mapping, or SLAM, is a field of algorithms developed in the early 1990s that aims to solve this dilemma. As a robot vacuum, SLAM allows you to use a camera or other senses to create a map of the landscape you are cleaning—aptly called mapping….

Colin Wang
Colin Wang

Consider the Roomba


Imagine you are a basic home robot vacuum. You are turned on and start vacuuming forward, but autonomous navigation is hard. Not having any knowledge of your surroundings, you can only randomly turn and continue your path after colliding with an obstacle. Without understanding the layout of the room nor comprehending where you are in relation to your environment, all you can do is arbitrarily chug on ahead, doing the best job you can at catching all of those dust bunnies. 


There clearly is an advantage to understanding your location: not only are you able to more accurately clean the entire room, but also you save time and battery life per clean. To estimate your location, all you need is a map of your environment to use as a reference. But to generate the map, you need to know your own location. There you are, a poor Roomba, stuck in a Catch-22, a game of the Chicken or the Egg.

Chicken or the Egg Problem

SLAM Demystified

An Introduction

Simultaneous Localization and Mapping, or SLAM, is a field of algorithms developed in the early 1990s that aims to solve this dilemma. As a robot vacuum, SLAM allows you to use a camera or other senses to create a map of the landscape you are cleaning—aptly called mapping. SLAM allows you to concurrently use other information gathered from the wheels and the imaging sensors to determine the amount of movement needed, and in what direction. This phenomenon is called localization. By repeating these steps continuously, you are able to build out a rough map of the room as well as come to an estimate of your current position. Essentially, SLAM allows you to rapidly piece together visual data, sensor data, and other data points to simultaneously pinpoint your own location and map out a broader environment.

Ok, you can stop pretending to be a Roomba.

How does it work?

So, how does SLAM work?

SLAM fundamentally relies on both front end, sensor-dependent processing and back end, sensor-agnostic processing to map out and understand the surroundings. As an overview, the front end takes in and processes data from the attached sensors and transforms it into an intermediate representation such as constraints in an optimization problem or a probability distribution about the current location. Then, the back end comes in to take the intermediate representation and solves the underlying state estimation or optimization problem to fully understand the surroundings.

For the front end, the type of sensor used heavily affects the algorithm approach. The main two versions of the SLAM algorithm are visual SLAM and lidar SLAM, with the former relying on visuals acquired from cameras and other image sensors and the latter relying on—you might have guessed it—lidar sensors.

Visual SLAM

Visual SLAM (vSLAM) oftentimes is more accessible for the casual robot enthusiast, and is able to use most equipment ranging from simple cameras (wide angle, fish-eye, and spherical cameras) to more complex cameras like compound eye cameras and RGB-D cameras. Typically speaking, vSLAM is more effective with multiple cameras, as using a single camera, called monocular SLAM, may make it challenging to accurately gauge depth.

vSLAM algorithms can also vary, being broadly classified into sparse methods and dense methods. Sparse methods match feature points of images and use algorithms such as PTAM and ORB-SLAM, whereas dense methods utilize the overall brightness of images and use algorithms such as DTAM and LSD-SLAM. vSLAM research is ongoing here as major limitations arise when attempting to generate a 3D map from sensors that lack depth perception. vSLAM can also be subdivided into two more types based on the sensor: monocular (i.e. one camera) and stereo (i.e. multiple cameras which can capture depth). 


Now, onto LiDAR SLAM. Light detection and ranging (LiDAR) uses a laser sensor to achieve high-precision distance measurements. Output values from these laser sensors are typically either 2D (x,y) or 3D (x,y,z) point cloud data, and movement is calculated by sequentially matching the point clouds. Registration algorithms, or algorithms that are able to align two or more images of the same scene to a single coordinate system, such as iterative closest point (ICP) and normal distributions transform (NDT) are used to piece together all of the point clouds into an intermediate map.

LiDAR SLAM has its pros and cons. Laser sensors are significantly more precise than a typical sensor, and as a result, can be used for applications like high-speed moving vehicles like self-driving cars and drones. But this precision comes at a cost: matching the point clouds from LiDAR requires high processing power and must be optimized for speed. And in environments with few obstacles, point clouds are more sparse and more difficult to align, potentially resulting in inaccuracies in the map or location. In the end, the choice of LiDAR SLAM or visual SLAM generally comes down to the use case and the resources available. 

The Backend

Now, onto the back end! The back end uses the processed intermediate data and solves the underlying state estimation or optimization problem. If the measurements taken by the front end were perfectly accurate and you always had an exact understanding of where you stand in the environment, it would be no problem to piece together a perfect map of the environment. But in reality, there is always at least some degree of uncertainty with either the measurements or with the current position. It’s the back end’s job to reconcile the differences and put together a single map.

There are 2 general categories of back ends: filtering and smoothing. Filtering, which includes the extended Kalman filter (EKF) and particle filters, models the problem as an online state estimation, whether the backend estimates the state “on the go” with latest measurements. The Kalman filter leverages measurements observed over time, to produce an estimate of the unknown variables that tend to be more accurate than those based on a single measurement alone. Raúl Serrano wrote a great medium article here, and he goes far more in depth into the extended Kalman filter approach. The particle filter creates particle sets of the data to represent probabilities, and then expresses its distribution by extracting random state particles from the posterior probability. It sounds complicated—and it is—but Sharath Srinivasan does a good job of explaining it here

Today, the more popular approach is smoothing. Smoothing techniques estimate the robot’s trajectory from the complete set of measurements rather than just the new measurements. Oftentimes, this is most intuitively done with a graph-based formulation, where you use a graph to represent the variables and the relations between those variables to create a simulation of the surroundings. In recent years, pose graph optimization has become the de facto standard for most SLAM algorithms. As intuition for this algorithm, poses represent the location and orientation of the robot, and are the nodes for the graph. The edges connecting the graph represent constraints, and allow the algorithm to generate a map of the surroundings based on the sensor data from each pose.



In the end, what you should take away is that SLAM is more like a concept rather than a straightforward algorithm. Regardless of how it’s implemented, it consists of the same few steps: processing visual and sensor data to understand your current location and map your surroundings. Yet for how simple the concept might be, it is absolutely imperative for a robot to know where it is, in the context of what it’s seeing.

Ok, now you know what SLAM is, and how it works. But why does it matter? As it turns out, SLAM has various mission critical applications in rapidly developing fields, from robotics to autonomous vehicles to even augmented reality. You already understand the use case with the Roomba robot, but that same concept can be applied to more industrial applications of robotics. Robots that are able to be aware of their position in a warehouse or manufacturing plant have far more flexible contributions than current manufacturing machinery that are stationary. With over 400,000 new industrial robots arriving each year into the market, the impact SLAM could have is immense. 

SLAM can also have far-reaching implications beyond robotics as well. SLAM is key to allowing autonomous navigation with autonomous vehicles or unmanned aerial vehicles (UAV), and there are many creative applications in the world of augmented reality, where you need to get an understanding of your location in an artificial environment. It can even be employed in high-tech visual surveillance systems, construction site build monitoring and maintenance, as well as medical procedures like minimally invasive surgery (MIS). 


So why is it not used as widely in these industries today? Well, there are still many limitations to SLAM that developers, researchers, and roboticists around the world are trying to resolve. 

Firstly, no one sensor has fully solved the use case for SLAM. Simple cameras may not fully solve the depth problem. 3D liDAR scanners struggle with environments without many distinguishable features, like a long, open hallway. Even a Global Positioning System (GPS), the most popular solution for external SLAM applications today, struggles when there is poor satellite signal coverage like in many indoor environments, which limits GPS from being used in the majority of industrial warehouse use cases. 

There are software imperfections too. SLAM estimates sequential movement, which inevitably includes some margin of error. With most current day SLAM solutions, capturing so many highly accurate individual measurements results in the gradual accumulation of noise and measurement irregularities. As a result, there can be substantial deviation from the actual values, with map data distorting, or even full distorting. If a robot drives around a loop and ends up where it started, as the error builds up, the robot’s starting and ending point might not match up. This is called the loop closure problem. If a robot drives down a long, straight corridor, the map might observe slight bending due to something known as “drift error.” At the modern day level of technological advancement, these estimation problems are unavoidable. It’s up to the engineer to detect these errors and determine how to both account for and correct them. The importance of having useful maps and location become imperative when robots are required to do other tasks such as getting from point A to point B (motion planning), grabbing an item with an obstacle in front of it (grasp planning), or even identifying the objects that are moving in 3D space (object recognition).

In the end, though, arguably the biggest problem is how computationally intensive running the SLAM algorithm is on vehicle hardware. To achieve accurate localization and precise maps, you need to execute image processing and point cloud matching at remarkably high frequencies. These computations are usually performed on vehicle hardware, on embedded microprocessors that have limited processing power, and the challenge is how to execute such a computationally expensive algorithm on these microprocessors.

The Future of the Cloud

One current optimization method is parallel processing. Many processes in SLAM, like feature extraction, are suitable for parallelization. Utilizing hardware like multicore CPUs or embedded GPUs can also optimize for speed. However, the potential to offload data processing to the cloud seems like the most promising solution. Running the most computationally intensive steps of SLAM like the point cloud registration algorithms on the resources of a distributed cloud can generate significant performance gains. With the potentially unbounded memory and CPU power of these robotic clouds, SLAM systems can scale and create maps in larger, more complex environments. The cloud becomes even more valuable when you take multiple robots and tell them all to SLAM, merging the maps together in the cloud for one detailed map. This has major applications for items such as search and rescue, mapping of new terrain, and robot to robot communication on the floor. Of course SLAM in the cloud, however, also has its limitations. Remember that SLAM is also being used for both localization and mapping. If SLAM is being used for low-level control (i.e. robot movement within an area, collision detection, and more), it’s imperative to have low latency processing while proceeding to offload high-quality map data to the cloud. Thus, the future will be some form of hybrid on-board off-board SLAM where core safety-critical localization can happen while a larger and better map is shared and implemented in the cloud and information is updated on a need-to-know basis.

So there you have it: SLAM 101. While SLAM processes continue to improve each day, there are many future applications to be excited about. Who knows, maybe one day when you are racing a F1 car in augmented reality or flying an autonomous plane, you’ll think back to the algorithm that made it all possible.


Cover Image for Robotic Security – a Brief Intro

Robotic Security – a Brief Intro

Each year, over 430,000 new industrial robots arrive in the market. But those are far from the on

Colin Wang
Colin Wang
Cover Image for ASICS – What are they, and are they the future?

ASICS – What are they, and are they the future?

In the era of rapid technological transformation, Application-Specific Integrated Circuits (ASICs

Marcelina Krowinska
Marcelina Krowinska