How we calculated CO2 emissions from Bolt rides to balance them out

Mikhail Iljin
Bolt Labs
Published in
6 min readDec 20, 2019

Bolt is using technology to improve urban mobility. We are working towards a future with less traffic, fewer parking lots and lighter pollution, where private cars are replaced by an efficient network of shared cars, scooters, bikes and public transport. But this transition takes time, so to make even more impact now, we have launched our Green Plan and committed to offsetting CO2 emissions of all Bolt rides in Europe.

What did we need to know to begin carbon offsetting? Our own emissions, of course. Do we have all the data available? How do we calculate it? How can we be sure we’re correct? These are the questions that get you puzzled when, as a mature company, you need to work with data you haven’t thought about before.

On the other hand, you don’t need to know your footprint exactly to the very last gram. We were happy to offset the higher range of our estimates — and there are specialized companies who have the expertise to assess whether our calculations are reasonable. Our partner Verifavia conducted such an evaluation and granted us a certificate of approval.

But, first things first.

Getting the data

You can expect that a car park of 1 million drivers would be very diverse. Where do you get the most comprehensive dataset for vehicle emissions?

We worked with two data sources:

CO2 emissions are measured in grams per kilometre. There are two standards for measuring them: WLTP, the newer one, and NEDC, the older one. Whenever possible, we use WLTP data. If it’s not available, then we use NEDC. If the standard is not known, then we use what we have.

The first step of the real work is to join the car manufacturer, car model and year of manufacture from our internal database with the CO2 emissions database. The first problem that appears is that vehicles with different engine types and volumes have different emissions. What if the engine data for some models is not available?

In this case, we can assume that an average ride-hailing driver would use a car with an average engine. Neither 3.0L gas guzzlers nor 1.0L subcompacts are very practical if you need to drive for hours every day, arrive on time, frequently go outside the city and be efficient all the while. So, for such vehicles with unknown engines we would take the average over all engines for that model and manufacturer.

Cars produce emissions not only when driving the client to the destination, but also when driving to the client — it’s essential to include this part as well. So, overall, we need this information:

  • Car manufacturer
  • Car model
  • Year of manufacture
  • Emissions in g/km
  • Distance driven to client in km
  • Distance driven with client in km

Fixing missing data

What if we have a car that was manufactured during a year that is not in the emissions database? Where do we get the data for that?

It turns out that over time, cars are getting less and less polluting — the trend is clear and persistent.

So if we have enough data for some years, we can use linear regression to estimate the missing years — they would fit nicely on the slope created from the known years.

There is a caveat though: you cannot look too far into the past or future with such estimates. If you need emissions for year 2013 and the last year you have is 2007, then your guess would likely be wrong, because the further away in time the prediction is, the less accurate it is. The same goes for other statistical anomalies, like when you have too few examples for a certain model and the slope goes so steeply upward or downward that you get 1,000 or near-zero g/km of emissions for a quite ordinary gasoline car.

The good thing is that these approximations can be validated. You can test the method on vehicles using existing information — imagining that you lack data for a specific period, estimating it and comparing it to reality.

What about outliers?

Up to now we have managed to fill most of our dataset with emission values. It’s been going too smoothly, so it’s time for some quirks.

Cars get registered in each country separately by a person who trains the driver and signs them up to Bolt’s platform. With manual input, occasional mistakes are inevitable, so we go through the anomalies, such as very rare models, ordinary models with typos in names, or missing years of manufacture.

Some of those cars are perfectly valid; they are just rare in the ride-hailing world and on the streets. Here are the biggest emitters in our database:

Just look at this beauty, which proudly holds 1st place:

More on unusual cars. Electric cars have no direct CO2 emissions, so let’s exclude them from the dataset. Compact hybrids with very low emissions — these need to be reviewed manually to be sure that everything makes sense and that the low emissions are not an error in the data arising from averaging and extrapolating. Fortunately, there are not too many models of these.

Then we see some models that are fairly popular in some countries, but when we refer to our emissions dataset, they are nowhere to be found. What’s going on with Honda Fit, Toyota Vitz or Toyota Etios?

It turns out that these are special brands manufactured only for specific markets and not sold elsewhere. Or they may be sold but under different names. Honda Fit, for example, is actually Honda Jazz in Western Europe. So, we take the most popular ones that have the biggest CO2 impact and go over them one by one, matching them with those models that we have data on.

Result

By now, almost all emissions are known. Only 1–2% are still missing. We take the courage to fill them with the year’s and manufacturer’s average (like Škoda / 2013) and the remaining ones with the year’s average.

It’s worth noting that at as Bolt is a company that is growing very quickly, the snapshot of emissions that we took for Q2 2019 will soon be outdated. So we need to re-evaluate our footprint regularly and extrapolate it accurately into the near future.

Verification

Bolt is the largest ride-hailing platform in Europe to commit to making its journeys carbon-neutral. As carbon offsetting for companies like us is completely voluntary, there are no dedicated environmental agencies that cater to the ride-hailing industry. So we partnered with an established agency for aviation CO2 footprint management — Verifavia. They critically assessed the data we have, the assumptions and approximations we’ve made and looked carefully for material inconsistencies within our results. After we reviewed our calculations based on their feedback, an approval and certification was granted to us.

Thus we can now be sure that our understanding of our own carbon footprint is solid — and that, from now on, we can take measures to reduce it. This includes our current carbon-offsetting initiative as well as greater future support for electric vehicles.

If you’d like to learn more about the challenges that our data science team is working on, check this blog post about understanding the customer lifetime value with data science and this article about simulating cities.

If you’d like to join us in building the future of urban transport, visit our careers page.

--

--

Mikhail Iljin
Bolt Labs

I love data science & ML. Been in the world of data since 2014 - as a data scientist, data engineer, team lead. Currently working on AI for detecting cancer.