Understanding the problem we’re solving

Published in

Bolt Labs

13 min readMay 27, 2022

Hey! My name is Vlad Dascalu, and I joined Bolt in 2019. Currently, I’m an Engineering Manager in the Delivery Courier group. We’re responsible for driving engineering efforts that ensure we have a delivery network that can deliver Bolt products (such as food, groceries, and business orders) to our customers within the promised time and with high courier efficiency.

In this article, I wanted to share a specific set of principles that you can use to multiply your impact in a software-driven environment to confirm that you’re solving the right problem.

Each principle is then doubled-down by an example situation that we had to tackle in real life, with additional details explaining how the principle helped us and the final solution that we implemented for that particular situation.

Start from the customer

The customer is the entity that pays for the service (and derives the main value from the product that’s being built). Usually, there are other parties involved — stakeholders, partners, and intermediaries — as part of the ecosystem. Still, if a customer isn’t happy and walks away, all the other parties cannot keep the ecosystem up and running by themselves.

A typical example is big contemporary search engines, where persons paying for the infrastructure built are advertisers; hence search results tend to be optimised, at least partially, with them in mind (for ad-revenue generation purposes).

Bolt Food is a 3-actor marketplace where we’re trying to orchestrate restaurants and couriers to deliver orders to the end-user (eater). Even if initially we might be tempted to consider couriers as the customers for our courier teams, in the end, a lot of the value that we bring to the table is dedicated to Eaters (by orchestrating couriers to ensure they’re not late with their orders).

When orchestrating the order, we want the restaurant and the courier to finish their actions simultaneously. The order preparation should ideally finish when the courier arrives at the restaurant to pick up the order.

This way, courier waiting times at the restaurant are minimised, and the food doesn’t get cold waiting for the courier to arrive. However, we realised that restaurant order-ready estimations are inaccurate and could benefit from corrections made automatically based on their past prediction accuracy data.

Hence, we showed that couriers corrected arrival times while restaurants still saw their original estimations. Thus a conflict arose between some couriers and restaurants concerning the time observed by each.

We navigated this ambiguity by working backwards from the customer.

Is it more important for us to align the restaurant and courier time, or do we want to orchestrate their actions for an eater to have their food arrive as early as possible?

We decided to focus on our customers and realised that “time” has different meanings for each actor:

For eaters, we show them the most accurate time, based on all the delay information that we have up to the current moment in time;
For couriers, we show them a “contractual” promissory time that always decreases 1 minute per minute to highlight to them a contractual arrival time commitment;
For restaurants, we show them the time remaining as part of their estimations to encourage them to self-calibrate and make better estimation decisions next time.

Although this behaviour has the customer’s interests at heart, it creates some dissonance between the time seen by each partner. Thus, we had to address some of the issues raised to explain those differences and their reasoning.

Still, I’m pleased that we decided to optimise for the customer — something particularly tricky to do in this case by not jumping to fix the reported behaviour in time differences upon hearing the first complaint about it without proper analysis.

Understand discretionary vs continuous values

Discretionary values are more closely related to integer numbers — usually, there’s a gap between them (e.g., a binary value that’s either 0 or 1, and there is no in-between). Continuous values operate along a spectrum/axis: given two values, you can always take the midpoint between them, which is defined and should have a corresponding behaviour.

To exemplify this principle, I want to share with you the problem of having occasional spikes in demand now and then when we have the situation where our courier network can temporarily no longer handle the large volume of incoming orders at peak hours of the day.

When we scoped the original solution, our initial thought was that we would monitor the number of ongoing orders and the number of couriers online. We’d then temporarily close the city when that balance gets overwhelming to give time to our courier partners to handle the overload.

However, when thinking more in-depth about the problem, we realised that such a behaviour might be problematic for customers as it will create intervals as short as 1 minute or less when the city is “closed” for ordering.

Then 1 minute later, the balance between demand and supply could be restored, and the city will be “opened” again. So from that perspective, we would also be introducing a random behaviour where we would be creating “unlucky” eaters that arrived in the app in the unlucky “minute” and “lucky” eaters that would be able to place their order due to sheer luck.

This consideration led us to refine our approach. Nowadays, cities are no longer closed or opened, or black or white, but we rather have a more gradual process.

A city can be partially opened or partially closed to a specific set of orders — based on granular considerations expressed by floating-point metrics (max. order distance that we can deliver based on the courier network load, busyness of the network in specific city neighbourhoods, and the profitability of specific orders for Bolt).

This enables us to have a consistent approach in our service availability during peak loads and ensures we give eaters the chance to order according to a consistent and fair set of criteria.

Question implicit assumptions baked in the problem statement

Sometimes, we phrase specific problems with some biases in mind, which later turn out to impose particular constraints in the space of solutions that we’re exploring for the problem given.

It’s best practice to phrase the problem as closely as possible to what we want to solve without biases or preconceptions. Recently, a popular concept in encouraging this thinking style has been thinking from ‘first principles’, which means to assert only basic facts as truths.

It’s okay to combine first principles into higher-order statements. Still, when a higher-order statement cannot be decomposed into first principles reasoning, it’s better to put it under the question mark and figure out if it needs to be there in the first place.

To demonstrate, when we were initially designing Bolt Food, we phrased the need to pick out an idle courier and assign them to an incoming order to ensure that we could dispatch them towards the restaurant to pick up the order and deliver it to the customer’s destination address.

Do you see any biases or assumptions in that phrasing?

Later on, we started to think more generically in terms of ‘orchestrating’ couriers for them to deliver orders with maximum efficiency, throughput, and minimum delay.

With that more generic phrasing in mind, we realised that it doesn’t necessarily have to be an idle courier — we can assign a busy one as well if they’re likely to free up within the next minute or so, and it doesn’t have to be an incoming order — we can assign multiple orders to the same courier, especially if both of them are being picked up from the same restaurant, and they’re both delivered to eaters close to each other or living in the same destination neighbourhood.

Our rephrasing of the original problem allowed us to think from a first principle and tackle the problem more generically.

Nowadays, we have a probabilistic model built-in for our courier orchestration network where we infer probabilities of couriers freeing up in the next X minutes based on in-depth maps of current and future orders and the remaining route that still needs to be travelled as part of those assignments.

Although the complexity of the code increased, due to these insights, we’re able to make couriers more productive and with a higher throughput when operating on our delivery platform, which translates into efficiency and higher earnings on their side when compared to alternative delivery networks.

Translate binary problems into a linear model

Most of the time, people will come to your team with a problem that appears binary.

Don’t hesitate to challenge them and drill down into understanding what they truly want to solve. You’ll often find a trade-off decision to be made between different parties or concepts, which should give you some way to incrementally roll out the solution and test it at different trade-off equilibrium levels, realising what’s the best compromise moving forward.

For example, I wanted to share a situation where operation teams from various locations complained that bike couriers were seldom assigned to orders, at least not to the same degree that car couriers were.

We investigated the issue in collaboration with them. Most of the time, a car was winning the ‘assignment’ race as cars were being modelled with higher speeds, at least over long distances where main city roads could be used as part of the route over longer segments, which caused bikers to have higher customer arrival times.

Our initial thoughts for solving this problem were to ensure a system where bikes always get dispatched (over cars) if the total order distance is lower than 1–2 kilometres. Nevertheless, upon deeper reflection, we realised the flaws of such a binary system. There would be neighbourhoods (particularly those farther away from the city centre) where bike selection would be limited.

Such a system would cause bikers in those areas to travel significantly until arriving at the restaurant location, to their detriment. We quickly realised that we needed a better solution.

We implemented a bonus points system to our assignment algorithm, where bike couriers could have some seconds shaved off their eater arrival time if the order was short in distance. The shorter the route, the more seconds we shaved from the arrival time.

In that way, the system had an underlying linear modelling of the bonus points granted, which allowed us, upon rollout, to experiment with different coefficients and different degrees of boosting. We could then figure out what works best to meet the trade-off between bikers and cars at the optimal equilibrium point.

Binary approaches create more significant problems than that. For example, there are a bunch of algorithms in courier networks where we split the cities into polygons and define such artificial borders to define if a courier is inside or outside a specific polygon.

Just imagine a situation where a courier would move 1 metre, and all of a sudden, they’d inherit the properties and the parameters of the neighbouring polygon, causing unexplained variation for the sudden alteration of their entity properties and ultimately their behaviour (set of orders that they end up being proposed to, etc.).

Even problems that hardly appear binary can be transformed upon a more in-depth analysis. In these cases, you have a gradual, incremental decay and remove any hard borders that might cause a sudden jump or step-function in the evolution of an entity’s set of properties.

Consider probabilistic modelling

What we often take for granted, potentially with well-defined precision beyond any reasonable doubt, is usually a high-varied human behaviour that, in practice, happens with some probability of variance. As an engineer, being able to realise that variability, and even more, model it with proper mathematical instruments, can be the difference between a successful or a failed project outcome.

For example, remember earlier that I said that we want to have the courier and the restaurant coordinate their actions to finish at about the same time.

To take specific examples, if the restaurant is preparing the food in 10 minutes, and the courier would arrive at the restaurant (from the current position) within 5 minutes, then ideally, we should give the restaurant a head-start of 5 minutes and dispatch then a courier towards the restaurant 10–5 = 5 minutes later.

Upon an in-depth look, we realised that the above plan contains a lot of idealised human behaviour. There’s no exact punctuality that people could achieve consistently on a repeated bias. When a courier is estimated to arrive in 5 minutes, sometimes they will arrive in 6 minutes, sometimes in 4 minutes, and sometimes they will be 9 minutes late.

A mathematical concept that we found insightful for modelling that uncertainty is the Gaussian Bell, which not only offers a graphical representation of that uncertainty but also introduces a formal measurement of that variation (under some modelling assumption — having a normal distribution of it) via a parameter called standard deviation.

When you’re coupling this principle with the initial one (start from the customer), you can end up with significant powerful insights that achieve an epiphany regarding the solution you want to take on.

In this case, we realised that 5 minutes are going to be travelled in practice by couriers with an uncertainty that still makes them arrive in the interval of [3, 7] minutes in 80% of the cases, hence causing +/- 2 minutes delay to the eater at the end of the day. These values would grow proportionally at longer order durations (+/- 4 minutes of delay for a 10-minutes travel trip, and so forth).

Since our goal (that we worked backwards from) was to prevent the eater from experiencing delays and get their food as fast as possible, we had the insight that, based on specific city configurations, we could enable the operations team to decide on specific cities, based on observed historical data, to configure couriers to arrive earlier by X minutes compared to the estimated food preparation time.

So, for example, if they would set this value to -2 minutes, couriers would wait on average 2 minutes for the order to be ready, but the number of cases where food does not need to wait for the courier to arrive would decrease from 50% down to 25%, and similarly, the number of instances where eater delivery would need to be pushed further out would drop in the same direction.

This approach aligns with our mentality where we optimise our solutions for the end-user, as it’s them supporting our market ecosystem.

Making this offset a tunable parameter also enabled us to experiment with different thresholds and achieve the best results when selecting an optimal behaviour for our partners and our customers.

Understand and optimise for the long-term

Often, there are emergencies in our careers where we have to do something quickly to mitigate an emergency or an ongoing situation.

Such times are understandable, and they will come up in any business, sooner or later. However, doing them on an ongoing basis is not sustainable. At Bolt, we always want to understand the long-term implications of our efforts and optimise for the long term — we don’t want to let short-term profits stand in the way of long-term success.

One example is our request to build a per-city configuration system that our operations team could use to configure various values at the city level (including values such as courier-arrive-early-seconds that I mentioned before).

A pivotal epiphany that we’ve had was to try to understand the complete customer journey for such fields, who would set them, based on which considerations and criteria, etc.

One insight was that the operations team had little visibility or guidance in setting appropriate values for some of these fields. We realised that even in the short term, editing them accordingly would be a challenge; hence we agreed that they would need to be set by some automatic process involving extrapolation or inference based on some historical data and that, at most, humans would be involved just occasionally in reviewing these data edits and ensuring they make sense end-to-end.

Thinking about the whole journey of the experience provided us with valuable insights about our plans (which we then got to plan accordingly for the next quarters) and influenced our requirements and system design approach for this system significantly.

We built some future expectations of interoperability with an automatic editing system for a subset of those city fields from ground zero.

Consider whether machine learning could help

Some problems are trickier. A potential consultation with a Data Science engineer gives you the chance to get their perspective on what you’re struggling with. It will clarify whether a data-science based approach could be applicable in that case.

To give a specific example, we were trying to minimise the estimation error between the initial communication to the eater regarding the order’s arrival and the actual time that we were observing in reality.

We figured out that there is a variation concerning the time that the courier takes to park their vehicle, depending on the destination neighbourhood. Still, we were falling short in formalising this error and getting some consistent methodology via which we could infer it more accurately based on the neighbourhood id.

After a consultation with a Data Scientist engineer, we found a way to produce a consistent modelling approach for it. Every neighbourhood gets an id for which we feed training data based on historically observed order behaviour. A machine learning algorithm was the missing piece that would enable us to get in real-time for future (or ongoing orders) the inferred most likely parking delay for that eater destination.

I was pleased that we managed to put this idea into practice. Using this approach, we reduced the error in estimating the order arrival time by high single-digit percentages.

- — -

Thanks for making it up to the end. It was my pleasure to walk you through those principles and to raise a few applied insights from the Delivery Courier group on how we could put them to good use with concrete examples from our applicable area of activity.

By reusing these principles in your day-to-day life, you’ll ensure that you truly understand what problem needs to be solved before starting work on it and what’s the best approach to get to the desired customer value.

This helps to avoid unneeded work in a potential wrong direction. I can only hope that the stories shared proved to be inspiring and valuable for your use cases and that you get to apply at least a couple of them in your next projects.

Until next time,

— Vlad

Understanding the problem we’re solving

Written by Vlad Dascalu