Leveraging Dwell Time Models to Create Dynamic Vehicle Services (Peer-to-Peer Car Lending Service)

Jocelyn Wang
99P Labs
Published in
16 min readMay 17, 2022

Written by the MSBA Capstone Team, in partnership with 99P Labs: Tee Lakkhananukun, Prak Pola, Rebecca Stevens, and Jocelyn Wang

The MSBA Capstone Team is a part of the Masters of Science in Business Analytics program at Carnegie Mellon University.

Table of Contents

Project Overview

Our team had the opportunity to partner with 99P Labs, a research group focused on coming up with innovative features, concepts, services and designs to remain on the cutting edge of the mobility industry and potentially transform the landscape of transportation. In a recent related project, a team of Berkeley students built a model to predict the dwell time and location of vehicles based on car sensor and infrastructure data from a leading automobile manufacturer. In the problem we’ve been given, the main question we want to address is: If we are able to predict the dwell time and location of our vehicles, what business opportunities can we leverage with this information?

Introduction

This blog post is a continuation of our first blog post, which details our early data explorations and vehicle maintenance business use case. In this post, we start by addressing changes made to the dwell time model in order to improve its accuracy and effectiveness when applied to the business use cases. We then introduce the second business use case, along with research and survey analysis to support its implementation. Lastly, we perform a cost-benefit analysis to assess the value that our business model would ultimately bring to 99P Labs.

Model Improvements

Addressing the Drawbacks

We decided to select new dwell time buckets that would be more appropriate for our two business cases: 0–2 hours, 2–8 hours, and 8–24 hours. Instead of having the top-level bucket be unbounded (previously 6+ hours), we limited the dwell times to 24 hours because our P2P business case focuses on shorter-term rentals.

While the new dwell time buckets were more intuitive to our use cases, the data continued to suffer from severe class imbalance. We quickly discovered that the accuracy score wasn’t a reliable metric to observe. Instead, we want to focus on the recall score, which measures the percent of correctly-identified positive cases. This is because we want to ensure that the model predicts the correct dwell times for each trip.

SMOTE (Synthetic Minority Oversampling Technique)

We employed an oversampling technique called SMOTE to address the issue of class imbalance. This technique synthesizes new observations for the minority classes using k-Nearest Neighbors methodology. By doing this, we were able to increase the number of observations in the 2–8 and 8–24 hour buckets and even out the dataset. Putting this through the XGBoost model returned much more optimal and better balanced recall scores.

Location Prediction

Previously, we explored location clustering as a potential way to bolster our business use cases. The location clusters reported, however, would not have been optimal if we wanted pure location information. The clustering model contains information about other features that are affecting the cluster’s output. Hence, with a similar approach, we attempted to build a location prediction model.

This model involves a series of clustering and regression, but only with location features, namely the longitude and latitude. First, the last longitude and latitudes of each sequence (trip) were extracted from the dataset. Similar to the first step in forming a cluster for dwell time prediction, we applied the DBScan algorithm, a type of clustering method, to the longitude and latitude to form a set of clusters. The metric used to measure distance was Haversine distance, which measures the angular distance of a sphere surface and is suitable for measuring distance for longitude and latitude. With the centroid of each cluster, we fitted an XGBoost regression model to predict the cluster. Since there is an issue of information leakage of location with the dwell time model, we did not specifically use the location prediction in our business model.

However, because both of our business models require location information to be successful, we propose a similar approach to the method mentioned here and extract key information such as zip code, city from the predicted longitude and latitude. A simple starting point would be to use the centroid of each of the clusters and use location API to extract this information.

One important thing to note here is that in order to predict both dwell time and location and to be useful in our business models, information leakage must be taken care of. Since the dwell time prediction uses the location clusters, additional location predictive models will be rendered useless. With the limited time, we did not fully explore the possibility of eliminating the location features in order to predict the dwell time, and thus would be one of our recommendations for 99P Labs to continue looking into this.

Peer-to-Peer Vehicle Lending Service

Background and Research

For our second business plan, we developed a peer-to-peer vehicle lending service bolstered by our predictive models for customers’ dwell times and locations. This service is similar to existing apps featuring a popular decentralized car rental platform. Unlike traditional car rental services, this OEM allows renters to book vehicles directly from private car owners. Some benefits of this model are:

  • Very low overhead relative to traditional car rental services.
  • Able to scale their percentage taken from each transaction based on the insurance options their customers choose (minimum, basic, or premium coverage).
  • Customers are not limited in the make/model they choose: they are able to rent any car available on the app or website.
  • Cars offered are generally in better condition and better-maintained than traditional fleet-based car rental agencies.

Users can rent vehicles for travel, as well as to test drive cars they are interested in purchasing. The car owners earn income from each of these transactions. This OEM’s main source of revenue however, is through the percentage of the insurance renters pay, anywhere from 10–40% of the cost of coverage per transaction. Insurance is required for everyone on the platform including the hosts; in the US, this OEM offers hosts up to $750,000 in third-party liability insurance through Liberty Mutual, with five different plans to choose from.

Peer-to-peer vehicle lending platforms are disrupting the traditional fleet-based car rental industry. The most recent example of this is Car2Go, which pulled out of several markets in 2020 due to overinvesting in their fleet size and underestimating the number of resources required to continue day to day operations. Enterprise Car Share, ReachNow, and LimePod have all shut down operations in most of their cities due to the amount of competition present and financial burdens due to: rising insurance costs, parking premiums, maintenance costs, and telematics devices. Peer to peer car sharing platforms are able to avoid these operational costs by placing the onus on private car owners rather than having to manage their own large fleet. Private car owners already pay for their parking and vehicle maintenance, and this OEM even allows hosts to opt out of other insurance programs in favor of their own; this means the income is essentially all net gain. In addition, this OEM can avoid the problem of fleet distribution across their cities since peer-to-peer lending platforms are naturally more diverse and available in more locations. The graph below shows just how large a disruption peer-to-peer lending services are causing in the car rental industry:

Application of Peer-to-Peer Lending Models for 99P Labs

We believe 99P Labs can take advantage of this growing trend by introducing their own peer-to-peer lending service, targeted at current 99P Labs customers. The number of peer-to-peer car sharing vehicles globally is expected to reach approximately 990,000 vehicles by 2025 according to Accenture’s recent research; the peer-to-peer market is expected to grow to $21 billion by 2030 in China, US, and Germany alone. Instead of taking a percentage of the insurance fees from each transaction, we can instead take a rental commission from every transaction or introduce mileage-based fees, alleviating some of the insurance restrictions placed by other companies. A resilient business model, combined with diverse market regions and a data-driven marketing strategy has great potential for success. This service will be especially popular in crowded cities, where many people may currently prefer public transportation over the hassle of owning a car and paying premium parking prices; daily commuters especially will be attracted to the flexibility introduced by being able to have a car readily available as opposed to waiting for trains or buses. Finding hosts should be simple considering the numerous benefits hosts stand to gain, as the revenue from offering their car for rent can help pay off loans, insurance, and maintenance costs. Studies show that most private vehicles sit idle more than 90% of the time; there is very little opportunity cost involved with lending a car while a host is at work or working from home. Because the peer-to-peer car rental space is still not as prolific as the house sharing industry, this is the perfect time for 99P labs to invest in such a service. This model can also be extended to bike sharing as well, albeit a more niche market. The main challenges we anticipate facing as we move to implement this model are ensuring there is a large enough market for this type of service, and integrating our machine learning models into our business plan. This will be the main focus of our second survey.

Failure Modes and Effects Analysis (FMEA)

As part of our business plan model, we decided to explore some of the risks associated with the scheduling and execution aspects of our proposed service. Below is a brief, tabulated, summary of these risks and our plans to mitigate them. The severity and chance of occurring were ranked on a qualitative scale of 1–10, with 1 being the least severe and 10 being the most severe. These qualitative scores are based on our background research and the analysis of our survey results.

FMEA Analysis for P2P Lending Business Cased

Survey Validation

Survey Design

In this section, we present the survey data analysis and how it is fitted as part of our analysis of the peer-to-peer vehicle lending business case. We used Dscout to send out this second survey to 200 people. In addition to similar demographic questions as the first survey, the focus this time around was to gain insight into whether or not people would consider taking part in the peer-to-peer vehicle lending service. We also incorporated conjoint style questions into the survey in order to obtain unbiased information and to allow for regression analysis of their results.

Survey Results

Ultimately, 72% of respondents said they would consider lending their personal vehicle and 91% said they would consider renting someone else’s personal vehicle. Those who would be willing to rent out their vehicle said they would likely do this in the case that they have an extra vehicle that sits in their garage, they are taking a trip and are not home, or they are home for the weekend. For those interested in renting someone else’s vehicle, it would most likely be when they are taking a local day trip, or they’re moving and/or need a specialized vehicle for transporting items. Both renters and lenders agreed the preferred location for pick up/drop off would be a public location designated by the platform. In addition, we learned that, should a renter return the vehicle late, the lender would take an extra fee from the renter as compensation. In this situation, 90% answered in this manner and said they would not delete their account should this happen.

The last section of the survey was focused around conjoint style questions. We decided to incorporate this type of questioning in order to limit bias and to help us determine the consumer’s sensitivity to different features. In these questions we gave both renters and lenders different scenarios to choose from. We gave them different vehicles, rental duration, time of the week and prices to choose from and they chose the combination they preferred. This helped us gain a better understanding of what was most important to the consumers and ultimately, how price sensitive they are. In order to determine this we conducted logistic regression on these results. This is discussed in more detail in the next section. You can find a detailed breakdown and visualization of our results here.

Logistic Regression for Conjoint Analysis

The only available car type in the dataset was Acura RDX which is a type of SUV car. Therefore, we exclude this from the survey on the lending side. From the conjoint analysis, we calculated utility scores of each of the attributes using a logistic regression model. A utility score measures how much each attribute influences the customers’ decision to select an alternative (“How to Interpret Partworth Utilities”). Although the survey data shows that 72% of people are willing to lend their cars through the platform, the detail is not sufficiently granular and does not give information about what attributes are important in making a lending decision. Instead, we can calculate the probability of participating in the platform using the total utility score.

The figure below shows the coefficients derived from the logistic regression model on the lender side dataset. We deliberately set the intercept to zero because, when all attributes are zeros, the utilities score should also be zero. These log-odds coefficients can be thought of as the utility score for each of the attributes. Notice that the ‘features’ column does not include city location or does not include weekday time. This is because, in creating dummy variables for categorical variables, one category must be dropped to prevent multicollinearity issues. Each of the dropped values has a coefficient of 0. (Kuila, n.d.). The numbers are relative to each other and can give insights about the ranking of feature importance. For example, duration within 8–24 hours increases the utility score by 2 on average. Customers value longer duration car lending more than shorter duration (2–8 hours). Similarly, if the lending location is in a suburb area as opposed to a city location, the utility score decreases by -1.1 utility score on average. Weekend lending time increases the utility score by 1.06. Another interesting insight is that lenders do not emphasize on the potential earnings. Another way to interpret these results is to take the exponential of the coefficients and interpret as an increase or decrease of odds of being selected for each of the features shown in the figure below.

Similarly, the utility scores and the increase or decrease in odds percentage for the renters survey can be shown using the coefficients derived from the logistic regression model.

Exponential of coefficients gives the increase or decrease in odds

As mentioned previously, the attributes measured for the renters differed slightly from the lenders-instead of location (suburb or city); we measured the type of vehicles to gain insights on the usage type. In the table above (right), SUV is the most favorable type of car with a utility score of 0.37. This confirms that the RDX model in the Telematics dataset can be a good starting point for releasing peer-to-peer lending service. Like the lending side, the renting side also put emphasis on the duration and the time of the transaction.

Probability of Lending Calculation

We can use the following equation to calculate the probability of lending:

P=exp(Total Utility) / exp(Total Utility) + 1

As an example, we will select two random concepts to calculate the probabilities shown in the table below.

Utility score for two random scenarios

By changing the level in each attribute, the total utility score will be affected and thus increase/decrease the probability of lending. By using this method, we can see the maximum probability of lending by setting each level to determine the optimal total utility score.

Maximum utility score scenario

Cost-Benefit Analysis

We can now conduct a cost-benefit analysis from our machine learning model for the peer-to-peer lending case. The assumptions we made in order to make this calculation are:

  1. We will assume a 40% takeaway fee for 99P Labs, meaning the customer will earn 60% from using the platform.
  2. The cars for lending and renting are the RDX model.
  3. Only the base price is taken into consideration. In actual business settings, there will be additional fees incurred from using the platform.
  4. For 2–8 hours, we will assume a base price of $20.
  5. For 8–24 hours, we will assume a base price of $50.
  6. At the start, we will only offer a service in the City area.
  7. Assume that profit is realized when lenders decide to put their cars on the platform (i.e., there will always be renters who will use the service.)
  8. Customers will not accept if their actual dwell time is within 0–2 hours.
  9. The probabilities derived from the logistic regression on both lending and renting side are representative of the probability of someone lending their cars as well as the probability of someone renting the available cars on the peer-to-peer car platform.

The table below shows two product concepts that will be used to illustrate the costs and benefits of the machine learning model. From the survey result, City Location and Weekend were the most preferred choices for both lending and renting sides, thus we selected both concepts to have these levels. One of the concepts will have a short term (2–8 hrs) duration of lending/renting and the other will have a long term duration (8–24 hrs). With these attributes selected, the total utility scores were calculated using the coefficient from the logistic regression. We then converted the scores into the probability of lending and renting.

Probability of lending calculation for two different scenarios (used for illustrating the cost-benefit)

With the probability of lending P(lending) calculated, we constructed a cost-benefit table below comprising the potential actions in each of the scenarios. In this table, we only consider the cost-benefit for the weekend scenario because of customer preferences taken from the utility scores. For 0–2 hours, 99P Labs will not notify the customers to place their car on the platform. If correctly predicted, the expected earning would be $0. However, if incorrectly predicted, there will be opportunity costs incurred to 99P Labs because customers are inclined to place their cars on the car sharing platform. For the predicted value that falls under the 2–8 and 8–24 hours bucket, 99P Labs will make an offer to the customer. Customers will only be likely to place their car on the platform as long as it is within their dwell time. Therefore, the potential earnings can be calculated as the multiplication of lending price, and the probability of lending P(lending | hr = 2–8) or P(lending | hr = 8–24) minus the cost for notifying the customers. For incorrect predictions, we assumed that the only cost incurred to 99P Labs is the marketing cost. This includes the app maintenance fee shared across all users. We assumed that this theoretical marketing cost is $0.5 per push notification.

Cost-benefit analysis for the weekend model

Confusion matrix (only weekend and capped at 24 hours)

Confusion matrix for weekend model (actual time buckets in rows)

Expected profit for the Weekend Model

Using the cost-benefit table and the confusion matrix, we perform element-wise matrix multiplication to get the total expected profit.

The total potential expected profit created from the weekend model is 2. With 40% takeaway, 99P Labs can earn up to $9141.75 and the value created for customers is $13712.62.

Autonomous Vehicles (AV) as Future Direction

In looking ahead at the constantly developing landscape of mobility and transportation, we see abundant opportunities to integrate autonomous vehicles into these proposed services and feel that this can add value in a variety of ways. Having a fleet of autonomous vehicles to serve as a base supply for the P2P lending service, owned by the car manufacturer, can increase flexibility, allow for broader reach, reduce costs, and mitigate risk and potential failure.

Increase Flexibility

With AVs, we are no longer constrained by individual car owners and have greater flexibility in terms of location, timing and availability of trips. We also have control over the fleet size from the time we launch the service and can match demand as it grows.

Broaden Reach

This would allow us to extend our service to those who live in more remote areas and to those who are not able to drive. Another way to increase demand is to leverage semi-autonomous vehicles. This way, the car can be self-driven to the customer, at which point the customer can take control of the car and drive to their destination as needed.

Reduce Costs

If the car manufacturing company can provide their own fleet of AVs, they would be able to eliminate labor costs completely and receive the full profit from the transaction. We estimated that the expected total profit for the car manufacturer would rise by 375%.

Mitigate Risk and Failure

As noted above, some of the potential risks of the service include passenger/lender unavailability and cancellations. With autonomous vehicles, the severity and likelihood of these failure modes would drastically decrease.

Overall, the integration of autonomous vehicles into our P2P vehicle lending case can allow for better planning of supply and demand, which would allow us to serve more trips, which would then give the company more data to renter behavior, and this would ultimately improve the model performance. This is a virtuous cycle that would feed itself and bring about greater insights and allow for constant improvement.

Conclusion

The two proposed business models (vehicle maintenance and P2P vehicle lending), integrated with the predictive models, reveal great potential for 99P Labs. With the model and survey results and the research conducted in this project, both business models present promising opportunities. The modeling and survey validated the market for 99PLab. A risk framework consisting of FMEA tables and cost benefit analysis were developed for quantifying the model risks as well as possible methods to mitigate them.

There is, however, significantly more work to be done on both business cases. The dataset, which represented a short, 2-week time frame, was collected in 2018, and the distribution of the data may have changed since then. Another challenge is that with constant incoming data from sensors, there needs to be a strong data engineering infrastructure that can handle stream data before 99P Labs can consider deploying the model into production.

Acknowledgements

We want to thank our advisor, Neda Mirzaeian, for facilitating a smooth and enjoyable project experience, as well as our project sponsors, Rajeev Chhajer and Tony Fontana of 99P Labs, for providing support and direction along the way.

--

--