Simulating a car-sharing operation.

I’m often asked about the usefulness of programming on business problems, other than the more common Machine Learning classification and regression problems. Therefore, I decided to write this post, which is based on a case from one of the classes I took at the IE School of Human Sciences and Technology.
This post will be focused on the business reasoning behind the simulation but if you are interested on the code, it will soon be shared too.
I will model two different scenarios:
- On the first one, the company will not intervene in the relocation of vehicles across the city. The floating fleet of vehicles will move freely according to the users’ needs. We will see how on such scenario, some city areas (or nodes) soon begin to accumulate cars while other areas with relevant demand start showing to have supply problems affecting both, the company’s cash-flow and the users’ satisfaction.
- On the second scenario, the company will test a strategy to relocate cars to important areas, balancing the supply of vehicles and minimizing the losses due to unfulfilled demand, as seen on the first scenario. This will reflect on the stabilization of the company’s cash-flow and the improvement of the users’ experience.
Also, the case is completely hypothetical and at this moment, I have no relationship with the company mentioned on this example.
Introduction:
Car sharing is expanding through the world as one of the main solutions for smart mobility. Companies like Car2Go, with large fleets of car spread across the city are a solution for many of their users, who have opted for the car “as-a-service” model rather than acquiring the asset themselves. It’s a very convenient deal for users as it fulfills their transportation needs without having to lock any capital investing on a car and all the hassles and extra-expenses that come with it, such as insurance, maintenance, and taxes, just to name a few.
For the company, the business operation is not as easy and straightforward as the average user would imagine. You’ll see, other than keeping the cars charged, clean, and overall, in good conditions, the company must make sure the cars are also available for every user at the time and place the service is demanded. That last part is trickier than it sounds.
“Why would this be a problem if people -and cars- are moving constantly around the city?” some might ask… well, the thing is that even though people, in fact, move all over the city, some city areas might turn out to draw more cars than others and if no extra demand for cars is generated at these locations, cars begin to accumulate leaving other neighborhoods unattended, which will lead to many unhappy users complaining on twitter and loss of profitability for the company.
Therefore, car sharing companies must understand in advance how their users behave, learn how many cars, when, and where will they be needed to make sure the service is being fulfilled. In order to do this, the company can make use of data to build a probabilistic model and simulate the entire operation, and even estimate how certain changes in the service would impact users and KPIs prior to deploying live experiments that would affect real customers.
Business Problem:
- A car-sharing company was granted a permit to run a pilot test on the city of Chicago starting on the second semester of 2018.
- 500 electric cars would be used for a period of 12 months.
- The operation will be restricted only to certain areas. That is, no trips can end out of the designed areas.
- As the company has never had any operation in the city, there is no real car-sharing data that could be used to model their operation.
- The data science (or the operations research team) has been tasked with building a model of the operation in order to identify in advance where car accumulation could be expected.
- The model inputs will be:
- Number of days to run the simulation
- Number of cars
- Expected daily demand (weekdays and weekends)
I will be using the following KPIs and business metrics:
- Mean Utilization Rate: total_daily_use / total_capacity . (total_capacity = fleet_size * 24hrs_day)
- Mean Profit per Car: total_profit / fleet_size
- Mean Profit per Mile: total_profit / Mean
- Mean Profit per Trip: total_profit / total_trips
- Mean Served Demand: trips_made / (trips_made + trips_lost)
The case has been inspired on Car2Go launching operations in Chicago. However, both the case and its proposed solution are completely hypothetical.

The Data
As mentioned before, the company has no real data on how the user flow would happen in Chicago. However, it seems the City of Chicago has some very interesting datasets on its Data Portal. One of them being the taxi trips reported to the city. To this date, the dataset holds over 100 million trips all around the city.
As a starting point, it could be safely hypothesized that the company’s future users will move in a way that resembles how taxi users do, at least, regarding routes, hours, and areas. Once the live operation is in place, the company would be able to gather real data and tell how accurate our predictions are and where we would need to adjust.
On this post we will not get into the data cleaning details, feature engineering, and assumptions, but if you are interested, drop me a note, I’ll make sure I let you know once the code is uploaded.
Below, you can find a sample of the original dataset.

Daily Demand Distribution
Demand distribution during the day helps us visualize where the peak hours can be expected. High demand for trips would happen especially during the afternoon, around 07:00 PM, when it can be expected to be very difficult to find available vehicles. Also, peak hours for transportation demand change during weekends, where they can be expected between 12:00 AM and 03:00 AM.


Car demand and availability per Area and Time of day
Demand for trips also behaves differently not only depending on the hour, but also on the city area we focus on. In the below plots, we can see which areas are more active at different times and which destinations are more common. For instance, people leaving from Loop are more likely to go to the Near North Side. Nevertheless, between 08:00 AM and 09:00 AM, trips leaving from Loop are very likely to end in Loop as well.

Markov Process
This behavior seems consistent with what in statistics would be called a Markov Chain. In other words, the probability of a car’s destination can be modeled based on where a trip begins and the time of the day.
Using this information, is posible to build a model that simulates the daily operation showing which areas can present problems, such as a demand larger than the amount of cars available and at what time this would happen. Moreover, we could also model and hypothesize experiments to estimate new scenarios and their impact in service before rolling out live tests that might affect the users.
Below, we can see how during most of the day in the Near North Side, more people leave than those who arrive, while by Loop area, we would expect more cars being dropped-off than picked-up during the morning and opposite to the afternoon when there would normally be a lack of available cars.

Although this view gives us a good idea of what to expect, it is static. Moreover, it isn’t telling us much about the business and the KPIs that matter. For that, we should run a simulation.
Simulating the Car-Sharing Operation
The idea of this post is not to get technical about the details behind the stochastic model or the transition matrices that were built to enable the simulation. However, remember to drop me a message if you want me to let you know when the notebook with the code is uploaded.
This model tries to capture the population transportation patterns, which, as mentioned before, differ depending on factors such as the time of day, whether it’s a weekday or a weekend, and where the trip begins. As our data is rounded to 15 minutes intervals, we will use this value as our time windows for the simulation. That is, the simulation will move in steps of 15 minutes. Also, in order to keep thing simple, some rules and assumptions have been made. It is possible to increase accuracy on many levels, however, increasing the complexity of the model would also increase the computation time for each simulation.
Finally, keep in mind that the purpose of this model is not to predict future demand but rather, to simulate how a given demand would behave and its impact on the daily operation of the company. In order to predict demand, other techniques such as time series forecasting could be applied, but this would be a different problem.
Business Rules and Assumptions
Demand definition: Every time a user checks the app with the intention to book a car.
Daily demand:
- Weekdays: 1,970 trips
- Weekends: 1,660 trips
- Why these values? As explained before, the purpose is to model the operation, not to predict demand. Therefore, the model can take any forecasted input. The reason I’m choosing these values to run the model is because they represent nearly 5% of the taxi daily trips between the areas participating in the pilot.
Price/min: $0.41 (Fee charged to the user)
Cost/min: $0.2 (Variable operating cost)
Car autonomy: 300 Km (Distance the car can work on one full charge)
Minimum battery level: 10% (Battery level at which the car is locked until charged again)
Time to full charge: 5 hours
- At this point, the model does not account for transportation to a charging station.
- It is assumed that the car remains locked at the location where the last trip ended until full charge.
- Future versions could charging stations, their locations, and its impact on profitability.
Car Location:
- Before the first day of operation begins, the cars will be located on a way that optimizes the expected revenue based on where the demand will be expected.
- Two scenarios will be tested, one where no cars will be relocated by the company during the day, and one where the cars will be relocated under certain conditions.
Relocation time: 35 minutes (Time it takes to relocate a car to a new area)
Parking time: Normally distributed with a mean of 10 minutes and a standard deviation of 3.
- Unlike on taxi trips, after the user arrives at the destination, he or she has to find a place to park the car.
Dead time: Normally distributed with a mean of 14 minutes and a standard deviation of 5.
- Unlike on taxi trips, when a user books a car, she has to walk all the way to where the car is parked.
- During this “dead time”, the user is not being charged.
Scenario 1. No Car Relocation

Running 10 simulations for 30 days yielded the following results:
Car Availability:
When no action is taken regarding relocation, car availability is affected right from the first day in areas with important demand, such as Near South Side, Irving Park, and Near North Side. These same areas, which had enough car availability at the beginning, quickly run out of available cars which begin to accumulate in areas with less demand, such as Loop and West Town.

The fact that cars are not being found by the users translates into losses as by the end of the month, over 1,000 daily trips are lost.

When looking at the amount of trips made and lost per Area, we see that areas such as Irving Park are losing more trips than they are making, while areas that are concentrating large amounts of cars, such as Loop, barely start any trip.

Business Metrics
As expected, business metrics don’t look impressive at all. I don’t know much about the industry, but I can certainly say that when only 50% of the times a user needs a car, that demand gets served, these users are not going to be there for much longer. It could also be said that the other metrics look pretty bad as well. However, without knowing the industry standards that would only be a hypothesis for now. Nevertheless, running a simulation that includes car relocation would give us something to compare.
Average daily traveled distance: 1610.35
Average daily trip profit: 4113.20
Average daily trips: 939.79
Average daily trips per car: 1.88
Average daily profit per car: 8.23
Average profit per trip: 4.38
Average trips lost per day: 947.92
Average potential revenue lost per day: 4119.18
Average demand served: 49.78%Scenario 2. With Car Relocation
With some tweaks to the model, different scenarios can be tested. For instance, let’s imagine (for the sake of simplicity) that the marketing department has an idea to solve the relocation problem, they now need the operations department to run the model simulating an implementation of this idea and their assumptions.
New conditions:
- A promotion, through a push notification, is sent twice a day at 07:00 AM and 04:00 PM. A couple hours before the morning and afternoon peak times.
- The promotion is only sent to areas where an excess of cars can be found at that time (more cars than the projected demand during the following 4 hours).
- Under this promotion, users picking up cars at this areas can ride for free, as long as the trip ends in pre-defined areas. Specifically, areas where demand is projected to be greater than the available cars within the following 4 hours. This way, the company would be outsourcing the relocation to its users.
- It is assumed that the free ride would be a strong enough incentive so many users who normally would take another mean of transportation at these hours would now use the car-sharing option. Furthermore, even new users who wouldn’t normally use the service might join just to travel for free during these hours.
- It is assumed that between 200 and 300 cars can be relocated daily this way.

When a relocation action is taken, the cars are constantly redistributed from the areas where trips are most likely to end, to areas where demand is to be expected. This way, areas such as Loop and West Town don’t accumulate cars anymore while the Near South Side, Irving Park, and Near North Side now have enough cars fo fulfill the demand during peak hours.

The mean daily trips distribution is also smoothed as now it would be more likely for a user to find available cars during peak hours.

In this new scenario, we can see how only in the areas with the largest demand, a very small proportion of the trips is affected. It looks like if the assumptions happen to be true, both the relocation problem would be fixed, and the users would be able to use the service almost every time they need a car.

Business Metrics
On the new scenario, almost 100% of the demand would be covered and daily profits would increase by 70%, from around 4,000 to 7,000, accounting for the cost of relocation. It seems the new model would make sense.
Average daily traveled distance: 3102.02
Average daily trip profit: 8189.35
Average daily trips: 1877.75
Average daily trips per car: 3.76
Average daily profit per car: 16.38
Average profit per trip: 4.36
Average trips lost per day: 5.48
Average potential revenue lost per day: 25.48
Average demand served: 99.71%
Average daily relocation cost: 1180.15
Average daily trip profit after relocation: 7009.19
Average profit per car after relocation cost: 14.02Conclusions
Finally, we can even visualize where our cars be during the following days in both scenarios. It’s clear how in our first scenario cars tend to accumulate on some “absorbing” areas, while on the relocation scenario, important areas are never unattended and we can see how some areas stop accumulating vehicles.

We have seen how, through some data mining on third party sources we where able to build a stochastic model, we can simulate the operation of a car sharing business on a city where we have not operated so far. Of course, some assumptions have been made to simplify this example, but I’m sure you can get the idea.
The next thing to do, would be to try and gather some more data to test our assumptions (which might turn out to be too optimistic at this point), and maybe, begin some experiments and pilot tests on real users.
Originally published at https://www.javiergalvis.com/blog/simulating-a-car-sharing-operation.
@javiergalvis
