Autonomous robots out in the wild - a software engineering challenge

Pekka Kosonen
Dec 3, 2020 · 10 min read

Starship is bringing autonomous delivery to the world. We are here to solve the last mile delivery problem with fleets of sidewalk delivery robots. Starship was the first company to begin operating autonomous delivery robots in public spaces already in 2017 without safety drivers. Today Starship is the leader in the autonomous delivery space and has completed more than 500,000 deliveries to customers. We’re operating in 5 countries( including the US and the UK) with hundreds of robots delivering 7 days a week, 365 days a year. More information about the company background is available here.

People may think Starship is all about Robotics. In fact, only a handful of engineers in Starship work directly on that. We are building a rich set of products to automate delivery, and Starship provides multiple challenges to solve, both from a hardware and software point of view. In Starship, we design and build our robots — both the hardware and embedded software. We’re also building the backend infrastructure and services to communicate with the robots. On top of that, we have the marketplace connecting the consumers of the service with the merchants. Robots are autonomous, but in some situations human help is needed. Remote operations enables us to deal with the trickier situations where automation is too risky or technically expensive. We also have people in the service areas where we operate, ready to help them out and charge them overnight. Since the robots need a cm-level of accuracy, we have built our own GIS-based 3D mapping solution (you can read more about it from the Starships Engineering blog).

What makes Starship a great place for engineers is the broad range of engineering disciplines we’re covering, e.g. our autonomous driving department is building machine learning based solutions that gives the intelligence for the robots to drive autonomously. We have the infrastructure to communicate with the robots in a time critical manner including getting access to all the sensor data, like the video streams, from the robot.

Our fleet management solutions are matching the right robot with the right tasks to optimise delivery times and fulfil the ETA promises we’ve made to our consumers. We’re also building our own marketplace which is a full blown e-commerce challenge. The marketplace handles the order flow end to end, starting from when the consumers make the order through our consumer application all the way to it being fulfilled by the kitchens or grocery stores. To summarise, Starship is full of fascinating engineering challenges, and many of them haven’t yet been solved by any other company in the world:

Marketplace team

Our marketplace includes order state management and a range of payment integrations (varying from credit card payments to US university dining dollars) . The merchants are using the fulfilment solutions provided by us such as the kitchen and runner applications to accept the order and manage it’s state as well as physically interact with the robots by scanning them to identify the right robot, unlocking and eventually loading the robots and sending them on their journey towards the customers. In addition, the business logic, inventory and stock management, product enrichment and product pricing of our marketplace offering is handled by the team. Backend is mainly built with Node.js and GoLang using SQS and Kafka for messaging and GraphQL for the endpoints. The mobile applications are built with ReactNative.

Fleet Orchestration team

Core Backend team

Another aspect for the team is systems reliability - when robots lose connectivity, they are eventually unable to drive - so it better not happen.

Team uses mainly Golang and Node.JS to implement services, also some Elixir and Python is mixed into the bundle.

SRE team

SRE at Starship is responsible for providing the cloud infrastructure and platform services for running these backend services. We’ve standardised on Kubernetes for our microservices and are running it on top of AWS. MongoDb is the primary database for most backend services, but we also like PostgreSQL, especially where strong typing and transactional guarantees are required. For async messaging Kafka is the messaging platform of choice and we’re using it for pretty much everything aside from shipping video streams from robots. For observability we rely on Prometheus and Grafana, Loki, Linkerd and Jaeger. CICD is handled by Jenkins.

A good portion of SRE time is spent maintaining and improving the Kubernetes infrastructure. Another big piece of infrastructure that SRE is responsible for is data and databases where we mainly rely on MongoDB and PostgreSQL.

Finally, one of the most important goals of Site Reliability Engineering is to minimise the downtime for Starship’s production environment. While SRE is occasionally called out to deal with infrastructure outages, the more impactful work is done on preventing the outages and ensuring that we can quickly recover. This can be a very broad topic, ranging from having rock solid K8s infrastructure all the way to engineering practices and business processes. There are great opportunities to make an impact. Read more about SRE team.

Operational Services team

Field operators work is guided through the field assistant application that helps them solve the daily duties like preparing the fleet to be rolled out in the mornings, charging them overnight and occasionally changing a wheel here and there. The app is built with ReactNative and uses GraphQL exposed endpoints.

The team also creates developer tools to simulate past and future robot events and debug any problematic scenarios we have encountered in real life enabling other engineering teams to be more creative and productive.

Autonomous Driving team

Other challenges the team solves include things like determining robot orientation and location in space, driving gracefully in the vicinity of pedestrians, safety analytics, signal processing for radar and ultrasonic signals and hardware fault detection.

The utmost priority is safety of our robots and people. As we can’t test all real-life scenarios on the field, we rely on extensive simulation and in silico testing to make sure the software we deploy actually works before releasing it to our test ground (on a nightly basis).

Our autonomous driving software is primarily written in C++ for both the CPU and GPU, the remainder written in Rust and Python. Python is also of course used in our neural network training across multiple frameworks.

Data team

The business problems we are tackling come from a very wide range of topics. What kinds of customers use us more frequently? How many remote operators should we schedule for tomorrow? What is the optimal robot fleet allocation between sites? Does bad terrain make robot wheels break more often? Which are the new cities and sites we should launch our service in?

Hardware team and the robots

Our hardware team is responsible for the electronics and mechanics of the robot but also embedded software, operating system and communication layer of the robot and the infrastructure around the robot.

Challenges the electronics team is facing are twofold — how to get the best possible signal from the real world, which is rather messy, noisy and unpredictable and how to design things reliable enough so they work in the rather harsh conditions (water, vibration, heat, snow) our robots are facing while wandering around in neighbourhoods or university campuses.

In order to solve those challenges, a very good understanding of electronics, signal processing and physical process control is needed. As with any complicated system, troubleshooting and debugging is fun on it’s own.

We design the majority of the hardware components (both electrical and mechanical) in-house.

Inside the robot we have a main compute unit (Tegra TK1 in older robots and AMD Ryzen based system in newers) — this gives us enough processing power to perform computations needed by autonomous drive. Many of those computations are dealing with signal and image processing so they are very GPU heavy. Because of the network latency not all computations can be offloaded to servers in the cloud.

Besides the main computing unit we have a number of different sensors: cameras, radars, ultrasound, gyros and many actuators: motors, bogies and locks, controllers and processors. Some of the signal processing is so time-sensitive that it requires an FPGA to perform that work. The controller software is also written in-house by embedded software developers.

To summarise things, Starship is building an end to end autonomous delivery platform all the way from designing and building the robots to building the consumer and merchant facing applications.

If bringing autonomous deliveries to the world is a mission you’d be keen on helping us with, do have a look at open positions in https://www.starship.xyz/careers/ or feel free to get in touch with me directly.

You can also check out our Starship Engineering Youtube channel here — https://www.youtube.com/channel/UC6Vee4Zqd6oJayLO5uIcb9Q

— — — — — —

PS. When our Co-Founder Ahti Heinla started the company, I’m sure he wasn’t thinking of sheep detection algorithms ;)

Starship Technologies

Starship Technologies Blog — Follow our delivery robots on their adventures around the world.