Cyclistic Bike Shares Case Study(Google Data Analytics Capstone Project)

John Ikpeme
5 min readMay 13, 2022

--

By: John Ikpeme

This case study is for Cyclistic, a bike share company in Chicago. Working as a junior data analyst with the rest of the team, it’s my responsibility to use insights taken from my analysis to communicate the steps that must be put in place to acquire more yearly subscribers to the company.

1. ASK

The team’s focus is to find out how annual members and casual riders use cyclistic bikes differently. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno, director of marketing, believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members.

In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics.

Focus: What strategies would make casual riders switch to an annual subscription?

2. PREPARE

Data used for this case study is obtained from Index of bucket “divvy-tripdata” and is a public dataset organized containing Cyclistic data per month, organized in rows & columns, in csv format. This data is licensed and available by Motivate International Inc under this license, Data License Agreement | Divvy Bikes. This dataset will be necessary for the accomplishment of this business task as it shows Cyclistic data for casual riders and annual members, with information on ride id, start and end time for rides & start and end stations for each ride taken.

Data in consideration is for the last 12 months (May 2021 — April 2022). This data is downloaded, named using the right naming nomenclature and stored in a local drive. On close inspection, some data is missing across tables. This will be cleaned in the next phase.

3. PROCESS

In this phase, we will need to carry out some calculations on the data sets individually in Excel, before importing to a SQL database manager to carry out further analysis.

To further understand our users, we will need to find out how long each ride takes, and the day the ride starts. Starting with dataset for May 2021, and opening on an excel sheet, we create two additional columns. First, for ‘ride_length’, which is simply subtracting column ‘started_at’ from ‘ended_at’. To obtain appropriate cell formatting, cells are formatted as time (37:30:55). Secondly, for ‘day_of_week’, which is obtained with the following formula, ‘Weekday[started_at,1]’,with 1-Sun & 7-Sat. Format this cell as “General’. Repeat this process for all twelve(12) datasets.

Our entire data is too large to work with on an excel sheet, for this we need a Query software. For this study, we’ll be using Microsoft SQL. Create a dataset called ‘Cyclistic Data’ and import all datasets, and store with the right naming nomenclature (e.g. ‘dbo.May2021’). We need to run a query to merge all datasets into one, while focusing on necessary columns and sorting by the date each started.

3.1: Merging datasets in SQL

Next, we save our output as a separate csv file, and import it into sequel. Now, we clean our data for errors. There are some errors in our ride length, as some rides are over 24 hours long, represented as ‘NULL’ in our query, and others are non-existent(00:00:00). These need to removed from our dataset to increase the accuracy of our data story. Using the ‘REMOVE’ function, we eliminate NULL values and non-existent for ride-length from our dataset. Now, we have clean data to manipulate.

3.2: Data Manipulation in SQL

4. ANALYSE/SHARE

Tools Used: Excel & Tableu

I. Total Rides Taken

  • Within the last twelve(12) Months from
    May 2021 — April 2022, approximately 5,753,132 rides were taken.
  • Cyclistic members embark on more rides than casual members
4.1: Total Number of rides per user

II. Ride Type Used

  • Cyclistic Members are more inclined to use the classic bike over the electric bike
  • Cyclistic members do not use the docked bike
4.2: Ride type used by Members
  • A majority of casual riders, just like cyclistic members also use the classic bike more often than others.
4.3: Ride type used by Casuals

III. Rides taken by Weekday

  • Casual riders take most of their rides on Saturdays.
  • Cyclistic members carry out a majority of rides on Wednesdays
4.4: Rides taken per user by weekday

IV. Rides per Month

  • Number of rides by casual riders has reduced drastically by about 50.8% over the past year
  • Number of rides by cyclistic members has dropped by 10.8% over the past year
  • July-August are best performing periods for both casual rides & annual members
4.5: Number of rides per month by user

V. Average time per ride

  • Despite cyclistic members taking more rides, casual riders, on average, spend more time on a ride.
4.6: Average ride time per user

5. ACT

  • Marketing Campaigns should be focused on the casual riders, with increased focus on Saturdays. To increase revenue influx, marketing for July — August must be prioritized.
  • Since casual riders take longer rides, thereby spending more per ride in the process, campaign should show financial advantages of being a paid subscriber.
  • Budget for docked bike development should be reduced

For link to presentation, click here

--

--