The story behind the creation of Yandex’s delivery robot

Yandex Self-Driving Team
Yandex Self-Driving Group
23 min readDec 16, 2021

Hello everyone! My name is Alexey. I work at Yandex Self-Driving Group, where I have been responsible for developing hardware for three (and a half) generations of Yandex robots. In this article, I will not only give you a brief overview of the latest generation but will also share the story behind the creation of our delivery robots, from the very first plywood prototype to the current model. I’ve also thrown in a bunch of videos and photos from various stages of development. Let’s get going!

How the third generation differs from the first

We introduced our first robot to the world on November 7, 2019. Since then, we’ve delivered orders in the snow, in the rain, during storms, in sunny Dubai, along the broad sidewalks of Ohio and the narrow — sometimes repaired — streets of Moscow. So far, we’ve developed, manufactured and tested three and a half generations of robots: the R1, R1.5, R2 and R3. All the experience we gained by creating and operating the first generations went into the development of the third generation. I’ll now explain how the R3 robots differ from their predecessors.

This robot is bigger and more spacious. We can now transport a payload of up to 20 kg in a 60-liter compartment — for example, six pizzas with a diameter of 40 cm and three 2-liter bottles of Coca-Cola. Only five pizzas could fit inside the second-generation robot, while the first generation could carry pizzas no larger than 35 cm in diameter.

Starting with the second generation, the lid of the robot could be opened and closed automatically simply by pressing a button in the app. The third-generation robot has learned to detect when someone is trying to close the lid manually and to help them do so. The cargo compartment is locked and can be opened only through the app and only by the recipient of the order.

We designed the robot ourselves, and we entrusted the production of parts and assembly to contractors. And, of course, we purchased standard components like chips, connectors, hard drives, and processors. Radars and wheels are also off the shelf, as are lidars — though we will soon replace them with our own ones, produced in-house. For the first generation of robots, a higher percentage of the components were off the shelf in order to speed up the prototype. In the second generation, we developed and tested some of the modules; in the third, we developed a lot more of the electronics.

The first-generation robots could operate for 3.5 hours without recharging. Then, we upgraded the first generation by installing ARM processors (see below), increasing the operating time to 7 hours. Once we optimized the electronics, the R2 could operate for 10 hours. While the third generation can operate for 8 hours, the battery can be rapidly replaced in just a few seconds, meaning the robot can continue making further deliveries right away. For the purposes of hot swapping, we equipped the robot with a small, nonremovable backup battery. Both batteries are assembled by a contractor using off-the-shelf cells in line with our technical specs. The frame is our own construction.

Replacing the battery

The first generation of robots had a rigid suspension without shock absorbers and bogies on the two front axles. Now all the axles are on independent leaf springs, and the bogies have been moved to the back: experience and experimentation showed that the robot moved better this way. We tested a “soft” suspension on the R2 — and, with this production experience under our belts, we designed the R3 suspension. In the section on development, I’ll come back to how we tested the new suspension.

For the third-generation robots, we installed a lidar with 64 beams (the R1 had 16) and a large vertical angle of view. To make better use of the large field of view, we moved the lidar to the front of the robot.

In order to spot cars earlier when the robot is crossing at pedestrian crossings, we placed — as part of the R1 upgrade — two radars on the sides. The radars enable the robot to detect moving objects and quickly estimate the speed at which they are approaching.

We also installed additional cameras on every side (the first generation had only one camera), and we replaced the lenses on the R3 with fisheye lenses to increase the field of view, providing 360-degree coverage.

Starting with the second generation, we have been using our own proprietary ultrasonic sensors. The first generation used ordinary parking sensors. With our sensors, we can capture more data that is useful for driving in different conditions — we couldn’t do this with the off-the-shelf sensors. In the R3, we’ve increased the number of sensors and optimized their design.

Now, two radars, one lidar, five cameras, an accelerometer and a GNSS help our robots navigate the world around them. Nine ultrasonic sensors ensure functional safety.

Our robot’s point cloud

The robot weighs 70 kg empty and can achieve a maximum speed of 8 km/h (5mph). It has six motorized drive wheels. The front axle is on an independent leaf spring suspension, and the two rear axles are on leaf spring bogie suspensions. The minimum ground clearance with a full payload is 100 mm.

We didn’t come up with this design right away. The first two generations enabled us to collect information while operating under field conditions and to refine the requirements for the platform. The design of each subsequent generation took into account the experience gained during the operation of the previous ones. I’ll tell you more below about how this took place.

Three generations of Yandex autonomous delivery robots

R1: How it all began

We started developing our delivery robot in June 2019. With the help of the first generation, we wanted to test as quickly as possible how the software we had developed for self-driving cars could be used to control a delivery robot on city sidewalks. Our self-driving vehicles at that time already knew how to operate without a driver on public roads.

We tried to make the prototype using off-the-shelf components — if possible without developing anything of our own. After all, we didn’t know what a robot should look like at that time, so we decided it wasn’t a good idea to spend in-house resources developing something without a clear vision.

Engineering Center

Everything that I describe below would have been impossible without our Engineering Center. This is a magical place where ideas become a reality. The Center’s staff are passionate about what they do; they’re people with extensive experience in various industries who can completely disassemble and assemble any car, be it a self-driving vehicle or a race car. The staff have all the necessary equipment to quickly produce batches of experimental parts. This greatly reduces the development cycle when you need to carry out multiple iterations. All our robots (currently totaling two hundred) are assembled here.

Chassis

We formulated the initial requirements for our robot’s chassis. Whether these requirements corresponded with reality had to be checked under real-world conditions, so we tried to produce a working prototype as quickly as possible.

We then brainstormed and researched existing chassis designs. Our team included guys with experience designing and building robots and automobiles, so they came up with a huge number of options for consideration. Following much discussion and debate, we chose a working option: a six-wheel chassis on motorized wheels — all of which were drive wheels.

So, for the first prototype, we took 8-inch wheels and drivers from gyro scooters, a Nucleo development board and assembled a prototype of a remote-controlled six-wheel chassis made out of plywood and aluminum. We modified the firmware for the drivers, and the control commands were sent through the Nucleo.

Connected to the dev board was a Wi-Fi adapter that received commands from a laptop. We ran a Python script on the laptop that converted the commands from a Bluetooth joystick. We still sometimes use this script in order to test some new low-level hardware features for the robot.

The platform was assembled and prepared for testing in about two days by one design engineer and one embedded engineer. And the manager (me) stood nearby holding a grinder.

A big red button to turn the system off was placed on the prototype itself, so I sat on top during the first tests, with my hand next to the button — just in case. And it came in handy during the very first test on the ground: the drivers from the gyro scooters turned out to be too smart and, when a certain combination of control commands was entered, the platform began rotating uncontrollably in place. It’s interesting that we didn’t notice this behavior when we ran tests on the platform in a suspended position, which we did before starting testing on the ground.

One of the first rides on the platform

Overall, the platform proved to be suitable for use in the prototype: it was sufficiently fast and capable of carrying a load. But the tests showed that fixed wheels weren’t a good solution: on uneven ground, the chassis began to turn unpredictably due to the fact that not all of the wheels were touching the ground. We added a bogie to the front two axles and also added springs to the middle axle so that the robot would turn mainly around the center of the middle axle — this was necessary to ensure reliable software-based control for self-driving vehicles. We also tested active control using actuators. They didn’t work very well, so we abandoned them in favor of a simpler setup.

Chassis equipped with actuators. The actuators were then replaced with tension springs.

Following these tests, the design department began developing a robot that could be tested outdoors. We planned to produce as many as 10 of them. Plastic vacuum forming and cutting and shaping sheet metal were selected as the production technologies. Some parts were 3D-printed, CNC-milled and fine-cut on a lathe. Our in-house team developed all of the structural elements. Most of the parts were manufactured by external contractors.

Sensors

The main objective of the MVP was to prove that it was possible to use our self-driving technology in our delivery robots. That’s why it was important to use ready-made components that we were familiar with in order to test hypotheses as quickly as possible. For this purpose, we chose the lidar that we put on the sides of our self-driving vehicles. We already had machine learning models that had learned to use it, and we hoped that they would work right away on the robot and that we wouldn’t need to spend a long time collecting new data sets. This lidar has a maximum range of 100 meters; it has 16 beams that rotate 360 degrees around the vertical axis to scan the space around the robot. Due to the nature of the work, it was best to place it at the back of the robot.

In addition to detecting objects, the lidar is used to determine the position of the robot in space — localization. The algorithm aligns the points obtained using the lidar with a three-dimensional map stored in its memory and searches for the best match. To do this, the lidar has to be able to see static objects — buildings, pillars, bus stops, trash cans — 360 degrees around the robot. Our lidar’s vertical angle of view was relatively small — 30 degrees; therefore, to ensure reliable localization, the lidar was installed in a perfectly vertical line.

To improve near detection, we added two parking sensors to the front of the robot. At first we used off-the-shelf parking sensors. Like any standard ones, they beeped when they saw an obstacle, and the robot could startle pedestrians. Due to this, we decided to remove the beeper from the control unit. In addition, we had to hack the communication protocol between the control and display units — otherwise, it would have been impossible to use the parking sensors: they weren’t meant to enable data to be read in electronic form.

At first, we just looked at the distance on the parking sensors and, at a certain threshold, slowed down, and then stopped. Due to the specifics of the upper-level software — which was designed to replicate a bicycle model — we were unable to turn around on the spot. There was no way to correct this without investing significant developer resources to incorporate a new model — based on a skid steer model. Therefore, at the lower level, we made it possible for the robot to turn in place whenever its parking sensors encountered an obstacle. It would turn to the side until it could no longer see the obstacle; then the self-driving algorithms would come into play and establish a clear route. In the end, the robot would get around the obstacle with ease. The skid steer model was incorporated for the next versions of our robots, with skid steer turn support, along with the response to parking sensors, moved to the upper level.

In addition, a camera was installed in the first robot, but it was used only to record video during test drives.

The brains

Our self-driving cars are installed with a serious x86 server platform with two processors and three video cards. Of course, we couldn’t put this in our robots because of the weight, size and power consumption. We had to scale the platform to work with limited resources.

Since all the software was written for x86 at the time, we didn’t consider ARM-based embedded solutions at that stage. Moving to ARM would have cost us a lot in terms of time and resources; so — at the risk of getting ahead of myself — I will say that we transitioned when producing the next generation, when we proved that robot movement controlled by the software for our self-driving vehicles was, in principle, possible.

At first, we wanted to use a laptop for VR gaming that players carry on their backs. But it turned out that they were no longer being produced at that time. So we decided to build our own platform. We took a mini-ITX motherboard, one video card and the most powerful desktop processor at the time and tried to get off the ground with all this. It worked.

In addition to the computer and the lower level of control, also on board were a MikroTik Ethernet router, a GeoHub module that had previously been developed for a self-driving vehicle (this is an Embedded Linux piece of hardware that is responsible for receiving a GNSS signal and for the accelerometer) and two connectivity modules (LTE + Wi-Fi) also from the self-driving vehicle, the power supply unit and the battery.

The R1 schematic diagram

Compare this diagram with the R3 diagram (at the end of the post). There is only one green component here — this is a module that we developed earlier for a self-driving car. We tried to reuse ready-made components as much as possible to speed up hypothesis testing.

The first two R1 robots

Scaling and adaptation of software

The main objective was to turn off everything that we didn’t need and to not turn off anything that we did need. To figure out what we needed and what we didn’t, I went around to all the development teams and asked the people who were working on our self-driving cars a lot of questions; I experimented a lot. For our experiments, we assembled a prototype without the external casing and attached a monitor with a touchscreen on it for convenience.

Our first time delivering water to a speaker at one of our in-house events

Results

Over four months, we managed to design and assemble a prototype from scratch that we launched in the city, as well as scale the software to operate with very limited resources and on other platforms. We hardly touched the code; we mainly changed the configurations. In this configuration, we assembled several more robots that were sent into the city — near our office in Moscow and in Skolkovo — and made our first commercial deliveries.

The R1 in Skolkovo

The robot inherited its soft ride and route planning from our self-driving cars. Of course, after the first successful trips, a dedicated software team was put in place, which began to optimize the algorithms and code specifically for the robot, and they did a great job over these two years. But that’s a different story.

R1.5: moving to ARM

The first x86 robots survived on battery power for about 3 hours. Even during testing, we had to constantly think about the remaining charge and plan everything to make sure we had enough power. To work in production, it had to survive for at least 8 hours (a shift). Energy consumption measurements showed that the computer consumed most of the charge, even when the robot was just standing still. The move to ARM held out the promise of significant power savings, but we knew it would be a challenge.

Software

An impressive codebase, libraries, development tools, infrastructure — everything was based on x86. Therefore, we knew that moving to ARM would be a complicated and resource-intensive undertaking. We had to optimize the software for the new architecture while maintaining compatibility with our large self-driving vehicles — after all, cars and robots had the same codebase. Once the code was ready to operate the robot using ARM architecture, it was still on a separate branch. In the end, it took about a month to merge it into development.

The infrastructure was also not initially designed for the new platform. With x86, the code was built directly on the robot. With ARM, we could no longer do this, so we had to learn how to build the code in the cloud and then transfer it in binary to the robot.

Hardware

To speed up the production of a robot with a lengthy operating time and test the new computing platform, we decided to separate the development of the new chassis (the R2, which is described below) and the move to ARM. We used the R1 project as a base to develop our ARM-based R1.5 robots. To maximize efficiency, we also upgraded our original R1 robots to R1.5 by using specialized upgrade kits developed in-house.

In the prototypes, we mounted wires in-place to connect the components. In the R1.5, we made the first iteration to improve the robot’s wiring. Among other things, we developed a special expansion board for the Nucleo to which you can connect peripheral devices using connectors, and we also put the accelerometer module on it in order to be able to close off the accelerometer feedback loop at the lower level and get rid of the GeoHub, which was too cumbersome for the robot.

We also developed a power management unit for this generation. This enabled us to monitor the currents and voltages on each branch and also to control the power supply for each of them using software. At times, this enabled us to reboot peripheral devices remotely.

Power Management Unit 3D model

We replaced the off-the-shelf wheel controllers from the gyro scooter with our own proprietary ones. We managed to achieve better wheel performance using our own controllers. We also changed the UART interface to a more reliable CAN interface, which we are familiar with, and we laid a good foundation for future developments by supporting the encoder and motor temperature control. Then we were able to use our motor controller for other tasks.

Two minor revisions of the MotorControl

In the first generations of robots, we used e-bike batteries. To optimize the robot’s structural configuration and get feedback (charge, health, load) from the battery, we ordered a battery based on our own technical specs from a manufacturer. Our batteries had a higher capacity and learned to provide feedback through the CAN interface.

Sensors

In the first generation, we had one IP camera. In the R1.5, we added three more and changed the interface to GMSL (like in our self-driving cars). We placed a camera on every side of the robot. Now we started seeing everything. We also added radars to detect oncoming vehicles from a distance while crossing at pedestrian crossings.

Results

The new platform’s operating time from a single charge more than doubled. We converted the R1 robots and assembled a couple dozen more R1.5 robots, which were the main workhorses in Moscow and Innopolis until mid-2021, after which they were gradually replaced by our next-generation robots.

R2: visitors from another planet

The R2 was conceived as a robot — we expected to make up to 100 of them for commercial purposes — with increased body strength, a larger cargo compartment and an automated lid.

Construction

We paid a lot of attention to the design. We didn’t want the body to have any joints, so that’s why it had a monocoque design and was made of fiberglass. When we put these robots into production, the contractors cursed us: someone had to crawl inside the mold and install the fiberglass from the inside.

3D printed prototype for storage capacity tests

The base of the robot was a welded aluminum frame. The parts of the suspension, the battery, the aluminum-sheet housing, the electronics and the monocoque were attached to the frame. The sensors were placed on the monocoque. It was important to place the sensors on a rigid structure that wouldn’t be disturbed during maintenance, because they were calibrated to one another: if they were moved even a tiny bit, the entire system would have to be recalibrated. The entire cargo hold could be removed from the robot: this allowed us to see all of the electronic components — which was convenient for repairs and maintenance.

The R2 aluminium frame
MotorControl and PMU under the cargo hold

Wheels

For the first generations of robots, we used motorized wheels from gyro scooters. Which, at one unfortunate point for us, were no longer available. They stopped making them, and we bought up all the rest in stores. We tried going to the factory that produced them. We came across a batch of wheels with completely different features, although they said they were identical when we were buying them. We had a table listing 10 kinds of wheels with descriptions of how to identify them and how good they were. An additional problem that arose was that we couldn’t put different types of wheels on the right and left, nor could we put lower-quality wheels on the middle axle. As a result, changing the wheels turned into a game of patience.

In addition, the wheels, which were designed for gyro scooters, weren’t easy to attach. To change a wheel on the first model, we had to disassemble part of the suspension. And in wet weather, the wheels started malfunctioning due to insufficient weatherproofing.

For these reasons, we decided not to use motorized wheels in the R2 but to put the motors inside the robot, transmitting torque using a system of pulleys and belts. We purchased about a dozen different types of motors. We designed and manufactured several prototypes with this sort of drive mechanism. Tests showed that, although some motors were sufficient for traveling on an even surface, we could no longer surmount obstacles, turn around on surfaces with a high friction factor, and the motors, located inside the body, would overheat. As a result, this setup had to be abandoned: motors with more torque were heavy, oversized and expensive, and the options with a gearbox were less reliable, expensive and noisier.

Bogie with transmission belt

At the same time, we were looking for good motorized wheels, and in the end we found a supplier of wheels that were high-quality, stable and hermetically sealed — and easy to attach. We tested the new wheels and decided to use them. And then we quickly remade the robot for use with the motorized wheels (we had kept in mind that this scenario was possible, and we took it into account in our designs). We’ve been using these wheels ever since.

Electronics

We developed our own motherboard for the computer in the R2. It contains an Ethernet router, Wi-Fi and LTE modems, video stream input cards and a GNSS module. So, we got rid of the bulky router, GeoHub and communication modules, reduced the number of interconnections and reduced power consumption, gaining another 3 hours of battery life.

Compute unit motherboard

Sensors

We switched to our own cameras that we had developed for our self-driving cars. They have all the necessary parameters: they’re compact, they can work in difficult weather conditions, and they offer high image quality. In addition, the sensor in our cameras makes it possible to remove flashing LEDs in the video, which is important for detecting traffic signals correctly when crossing a road. And yet they cost us less than similar cameras on the market.

Yandex SDG proprietary cameras

Instead of off-the-shelf parking sensors, we developed our own proprietary ultrasonic sensors. Off-the-shelf sensors would break down periodically, and being a black box for us, we weren’t able to figure out the problem on a systems level. In the end, our sensors provide not only the distance to the nearest object (one floating point number) but also an entire ultrasonogram. Now we can look at the data and adjust the trigger thresholds for different weather conditions and road surfaces.

We added one more parking sensor to the front of the robot to create a vertical stereopair that enables us to get more information about obstacles. And two parking sensors in the rear to prevent collisions when reversing.

Yandex SDG proprietary parking sensors
Parking sensor ultrasonogram

Results

We assembled about 100 R2-model robots. They’re now operating in all locations, including the United States.

The R2 delivering an order in Ann Arbor, MI

R3: the robot that sees everything

The main thing we wanted to do with this generation was for the robots to learn how to see small objects in front of them better. In addition, we wanted them to learn how to drive over high curbs and to drive better off-road, in snow, through puddles and in different weather conditions. We also designed a battery that can be replaced while the robot is running. This made it possible to reduce the amount of idle time while the robot is charging.

Under the hood

The R3 schematic diagram. Components developed by Yandex SDG are colored green.

The electronics under the hood of the robot include a carrier board, platform control, body control and motor control.

The carrier board is the “brains” of the robot. Using the algorithms that are running on it, the robot can recognize people, cars and obstacles; plan routes; and determine where it is located. The carrier board contains a router that connects all the components to a single onboard network. The video streams from the cameras also go directly to the computer.

The platform control is responsible for powering the platform, manages current limits on each power branch and switches to the backup battery when the main battery is removed. It also generates steering signals for the wheels and collects data from the ultrasonic sensors. The motor control receives the speed reference for each wheel from the platform control and controls the currents in the windings to ensure the desired speed under different driving conditions. The body control is responsible for controlling the lid motor, the lock and the LED lights.

All of the electronic components are located inside hermetically sealed enclosures.

Hermetically sealed MotorControl 3D model

Most of the robot’s body is made of fiberglass parts. The robot’s load-bearing foundation consists of a pan and a basket. Everything else is placed on top: the suspension, the brackets for the sensors and electronic modules as well as the external panels. The electronics can be accessed after removing the corresponding panel. At the same time, the sensors remain on the cargo basket and don’t require recalibration after maintenance.

Because we moved the lidar to the front, we were unable to proceed iteratively and make a new robot on the same chassis as the previous generation. You can’t just turn the body of the robot around; this entailed a complete reconfiguration and redesign of the structure. During the process, however, we got rid of the monocoque body and the aluminum-frame chassis. This made it easier to manufacture parts and simplified maintenance.

The R3 assembly in our engineering center

Sensors

We changed the lidar model. Since the creation of the previous generation, the localization team has learned to make better use of landmarks, and we were able to switch to a lidar that doesn’t see as far but has a wider angle of view and 64 beams instead of 16. That’s why we were able to place it at the front of the robot and tilt it forward slightly. Thus, we greatly increased the level of detail of surrounding objects in the point cloud.

Balancing storage size and the LiDAR viewing angle

We changed the camera lenses: they’re now fisheye lenses with an angle of view greater than 180 degrees. When the camera is mounted on the robot, it can see a small part of the robot itself. To get a good view of traffic lights on the opposite side of wide streets, we added one front camera with a telephoto lens.

The R3 front view. Cameras are mounted on the robots frame and don’t need to be recalibrated when panels are removed for maintenance.

Suspension

In the R3, we increased the ground clearance and developed our own winter tires with a more aggressive tread and greater surface contact. In the idea reduction process, more radical solutions to the issue of winter maneuverability were discussed, but it was important to strike a balance: after all, the robots spend most of their time driving on cleared sidewalks. The platform was ready for testing in the hottest summer months, so we used a track with artificial snow (sodium polyacrylate) for the tests. The tests showed that the R3 handled slush better than its predecessor. We’re expecting a snowy winter again this year, which means that we’ll have an excellent opportunity to run tests in real-world conditions.

The R2 doesn’t make it through the test track filled with “snow”
The R3 drives through the test track filled with “snow”
The R3 drives on snow in Innopolis

The leaf spring suspension on each axle increased maneuverability. Previously, the suspension was rigid, resulting in a lot of noise when riding over cracks in asphalt, tiles, cobblestone and especially when coming down off curbs. In order to run tests on the new suspension, we designed a prototype for the R2 generation and equipped several robots with the prototype. The robots were tested on a shaker that imitated a bumpy road, at a test track and on the bumpiest of our production routes.

Thus, we made sure the carbon leaf springs could handle the load and found a few defects that we managed to fix before putting this suspension on the R3.

Robot surmounting a test obstacle

Lid

In this version of the robot, we redesigned the mechanism for closing the lid. We made it more reliable by integrating the hinge directly into the lid (previously it was attached to the body with brackets). We also changed the type of motor. This allows us to sense a user’s attempt to slam the lid shut and to respond by closing the lid as designed. The cover can also sense if a foreign object is interrupting its path, and responds to this by automatically opening up again like an elevator. It can be closed with the press of a button, manually or through the app.

Results

In total, three generations of robots have already delivered over 80,000 orders. The assembly of the third generation is in high gear, with new R3s coming off the line every day. For example, robot №126 is battling the snowbanks of Innopolis.

While №127 is now admiring the cacti and palms of Arizona.

Will there be new versions of our robots? It’s entirely possible. We are constantly analyzing the convenience of our delivery service for users and looking at what can be improved in terms of hardware. The software is constantly being improved, and some new features may require hardware support. Our work isn’t coming to an end with the release of the third generation — it’s just beginning.

--

--