Data Analysis and Visualizations of Chicago Divvy Bikes Sharing

Introduction

Divvy is a bike share system across Chicago and Evanston that features more than 5,800 bicycles and 600 docking stations. Users of Divvy operation can be broken down into two main groups: those users who purchase single-ride or full-day passes are referred to as casual riders and users who purchase annual memberships are considered member riders. Although the pricing flexibility helps Divvy attract more customers, the director of marketing believes that annual members are much more profitable than casual riders, and maximizing the number of annual members will be key to future growth.

To do that, the marketing analyst team needs to have a better understanding of members’ and casual riders’ usage behaviors. In this case study, I will try to analyze various trends based on user types and bike types to find similarities and differences between the riders. In the end, I will provide recommendations based on insights that can help the marketing team to create a campaign to boost casual users’ conversion into member users.

In this case study, I used the one-year period of Divvy trip data (from January to December 2021) provided by Motivate International Inc, under this license. The dataset can be downloaded from here in .csv format and each .csv file contained data for a whole month. Riders’ personally identifiable information has been removed from the data due to data privacy.

Step1- Understanding business tasks and making questions

Business goal:

To maximize the number of annual memberships by converting casual riders to annual members.

Business task:

Analyze how annual members and casual riders use Cyclistic bikes differently in order to determine the best marketing strategies that would convert casual riders to annual members.

Stakeholders:

Divvy executive team, Director of marketing

Questions:

1. How do annual members and casual riders use Divvy bikes differently?

2. What can pursue casual riders to convert to annual subscribers?

3. What kind of campaign do you recommend encouraging casual riders to buy annual membership?

4. How can Divvy use digital media to influence casual riders to become members?

The tools that I used in the analysis are MySQL, Microsoft Excel, Tableau Public, Power BI, and Google Sheet,

Data limitations:

The data set had limitations for having a comprehensive knowledge of riders. in fact, we did not have access to any kind of demographic variables such as gender, age, ethnicity, income, etc.

Step 2- Data preparation

Data appending of 12 CSVs files into one database

My first challenge was appending the 12 CVSs files in the best method. I tried different methods, but in the end, I found combination data through the “Windows Command prompt” the fastest and easiest way to do so.

First, I created a folder to house all CSVs files and I browse the folder. Then I Copy the folder as the path. I open the Windows Command prompt and I typed “cd and press Space and paste of folder path”. In the end, I typed “copy *.csv combined-CSV-files.csv” and Enter.

Create database and table for Importing CSV into MySQL

I created a database called divvy_tripdata and a table named divvy_2021 in MySQL Workbench. The columns in the MySQL table need to match the data from the CSV file.

I imported the data from the CSV file into the MySQL database, using the LOAD DATA INFILE method. The LOAD DATA INFILE statement allows us to read data from a text file and import the file’s data into a database table very fast.

The combined table contains 5,595,063 rows and has 13 columns/fields. However, MYSQL recognized wrongly the data type of columns “started_at” and “ended_at” as a text.

I changed and modified the data type of both columns “started_at” and “ended_at” from “TEXT” to DATETIME”

Step3- Data Explorations and Cleaning (Process)

Each trip observation (row) is unique with ride id. It provides information about the time, station, location (longitude and latitude), bike type, and user type of each trip.

Ride_id: is a primary key field in the table which uniquely identifies each row. Each ride_id is a string type with 16 characters. There is no duplication and each ride_id has exactly 16 characters. Data cleaning is not necessary for this column.

rideable_type: the data contains 3 types of bikes: classic, docked, and electric bikes. However, after some research, I found out that ‘docked bike’ is the old name for ‘classic bike’. Thus, I changed all ‘docked bike’ to ‘classic bike’ which affected 302063 rows. I also checked if the column has any row without value or null.

started_at / ended_at: these columns show the date and time that the bike trips started and ended. Based on these two columns I created 3 other columns, ride_length, day_of_week, and date which later on, they going to help me in trending analysis and data visualization.

After creating the ride_length column, I found out that in some of the rows ended_at was less than started_at (ride_length <0). Therefore, I deleted these rows. Moreover, I removed all outliers. In my analysis, I’ve considered the trips with ride_length less than 1 minute or greater than 24 hours as outliers. There were 101,787 rows with outlier values that were removed during the cleaning process.

start_station_name/end_station_name: There are many trips with null values in either the starting station name column or the ending station name. I removed 5,288 rows with classic bike type that had null values in the ending station name, as classic bike trips must start or end at a docking station. However, I did not remove the remaining rows with electric bike trips and null values. Because electric bikes do not have to start or end at a docking station, as they have the bike lock option.

member_casual: this column indicates if the customer type. I double-checked that ‘casual’ and ‘member’ are the only allowable strings in this column.

Step 4- Analyze and Visualization

First, I tried to find out data distribution among members and casual riders. Although across all rides taken, the average ride length for casual riders (25.2 minutes long) was almost twice longer than member riders (13.1 minutes long), member riders took 10% bigger number of rides compared to casual users.

Number of rides under time free cost limitation (45 minutes)

Based on Divvy’s pricing plan, members pay $99 a year and get unlimited rides under 45 minutes and small charges (15 cents per minute) are incurred when single rides exceed over 45 minutes. However, casual riders pay as they go, they can purchase a daily pass for $15 and get unlimited trips on that day or for $3, one trips up to 30 minutes.

The great majority (98.2%) of trips by member users took less than the 45-min limit after which there would be an additional cost. This rate is 87% for casual users.

Usage behavior of member and casual users during the week

The average ride length of casual riders is significantly longer than member riders and has a peak on weekends. But Annual members are more consistent in their bike usage and ride duration throughout the week.

During the week, member users have a trend with small changes in the number of rides and on weekends, the number of rides is slightly dropped. In addition, the average ride time has a similar behavior during weekdays, but on weekends average ride times slightly increased.

However, during the week Casual users have a trend with small changes in the number of rides, but on the weekend, the number of rides increased drastically. On Saturday, the number of rides is 54% more than the average number of rides during the weeks. Moreover, ride time has also relatively increased during weekends.

Weekends & weekdays binary analysis

After analyzing the user’s behavior over the weeks, I decided to do a weekend- weekdays binary analysis because riders used bikes significantly differently over the weekends compared to weekdays.

Casual riders spend 46% of their trips during weekends. But member riders use the bikes slightly more but very similar to other days of the week. This can be attributed to the fact that casual riders use the bikes for leisure, while members mostly use the service to commute to and from work or school.

Usage behavior of member and casual users on the monthly based?

There is a correlation between the seasons of the year and riders’ behavior. The average number of rides rose exponentially during the summer months and fell drastically during wintertime. Bike usage rises during the Summer peaking in July and August for both types of users. But, casual riders have a slightly (1–3%) greater number of rides. However, during January and February about 80% and in November and December over 70% of the rides were taken by member users. However, during January and February about 80% and in November and December over 70% of the rides were taken by member users. In fact, during warmer months (May to September) members and casual users, have a similar number of rides.

July as the most popular month for the casual user has over 40 times greater number of rides compared to February as the least popular month. In fact, during the winter months, the number of rides by casual is very low and negligible. But annual members still use the service at a reasonable rate in those months.

Usage behavior of member and casual users in the season based?

In the 3rd quarter of the year, casual riders took rides 10 times more compared to the 1st quarter of the year. 42.4% of total rides were taken in the 3rd quarter of the year.

42.4% of total rides were taken in the 3rd quarter of the year.

Usage behavior of member and casual users on a daily based?

In a 24-hour span, the number of rides for both, members and casual users significantly spike during afternoon rush hours (16, 17, and 18 pm) but for members, users are more intense.

Member users use bikes to commute to work or school each day. However, it seems that mostly they use bikes to come back from work as the ride peak between 16 to 19 pm is much bigger compared to morning between 7 to 9 am.

Usage behavior of member and casual users in rush hours

After analyzing the user’s behavior on a daily base, I decided to do a rush-hour& non-rush hours binary analysis to have a better picture of their usage patterns. In fact, around 27% of the rides took in rush hours.

Utility rate and bike type preferences among annual members and casual riders.

Data shows that classic bikes are more preferred by both types of users. However, proportionally casual users are more inclined to use electric bikes. About 37.5% of total rides by casual riders were via electric bikes, compared to member riders that only about 30% of their total rides were by electric bikes.

During weekdays, casual riders use about 41.5% of their total rides by electric bikes. This rate during weekends is just 31.5%. For member riders, this rate during weekdays (36%) and weekends (32%) is very similar.

The pattern of electric-classic bike usage during winter and summer is completely different among riders. When the weather gets colder, using electric bikes increases as well. Even, in December and November, the number of rides by electric bikes is getting greater than the number of rides by classic bikes.

Key Findings

In this analysis, we found the patterns which indicate that:

  • Casual riders are more leisure-oriented usage the bikes although member riders use the bikes for routine activities like commuting to work or school.
  • The average ride length for casual riders was almost twice longer than member riders, but, member riders took the bikes 10% more by the number of rides compared to casual users.
  • During the week, member users have a trend with small changes in the number of rides and on weekends, the number of rides is slightly dropped. However, Casual users have a trend with small changes in the number of rides, but on the weekend, the number of riders increased drastically.
  • There is a correlation between the seasons of the year and riders’ behavior. The average number of rides rose exponentially during the summer months and fell drastically during wintertime. Bike usage rises during the Summer peaking in July and August for both types of users. But, casual riders have a slightly (1–3%) greater number of rides. However, during January and February about 80% and in November and December over 70% of the rides were taken by member users.
  • In a 24-hour span, the number of rides for both, members and casual users significantly spike during afternoon rush hours (16, 17, and 18 pm) but for members, users are more intense. Around 27% of the rides took in rush hours.
  • Data shows that classic bikes are more preferred by both types of users. However, proportionally casual users are more inclined to use electric bikes.
  • The pattern of electric-classic bike usage during winter and summer is completely different among riders. When the weather gets colder, using electric bikes increases as well. Even, in December and November, the number of rides by electric bikes is getting greater than the number of rides by classic bikes.

Recommendations

Certainly, casual users are a reasonable and easier target to be converted to member users as they are already familiar with the Divvy program and have experience with Divvy’s product. Therefore, important to convince them by buying an annual subscription how can save money and also getting better services. Therefore, Divvy must make a business strategy that led to making meaningful benefits to encourage casual users to make an annual subscription.

  • Rising up prices for casual riders during rush hours will be discouraging for them and will be an advantage for member users given to increasing their availability of the free bikes during the peak traffic commutes.
  • Given that during weekends casual riders use bikes disproportionally more than on weekdays, increasing “free cost limitation time” to a longer time or offering bonuses for longer rides during weekends can boost the conversion rate from casual to member users.
  • Initiating a promotional pricing campaign in summer and spring offering discounts for annual subscriptions.
  • Adding more electric bikes to inventory given casual riders proportionally use them more than member users
  • Offering a one-month free trial subscription or free -first-month campaign

Additional data for further analysis:

  • Including data that specifies whether a casual rider uses a single-ride pass or full-day pass, to analyze and provide a pricing plan to convince them they can save money by buying an annual subscription·
  • Including data on gender, age, and marital status can give a holistic insight into the customer demographic which could help to find and target customers who have a high potential to buy an annual membership.
  • Providing pricing data on single-ride, full-day, and annual membership pass in order to make a cost-benefit analysis which can help to have a reasonable pricing recommendation to encourage casual users to buy an annual subscription.

Thank you very much for your time, I hope you enjoyed the reading
I would be very happy to read your comments and feedback

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store