Introduction

I have recently completed my Google Data Analytics Professional Certificate through Coursera, and I am ready to put everything I learned over the recent months into use. I have learned about the six stages of the data analysis process, specifically learning how to ask the right questions, deliver valuable insight, and present my finings effectively to my target audience and the stakeholders on the project. The six stages of the data analysis process are: ask, prepare, process, analyze, share, and act. Using this method of analysis, I will be providing a clear business task, a description of all data sources used, documentation of any cleaning or manipulation of data, a summary of my analysis, and lastly, visualizations and key findings. I will be interacting with the company Cyclistic and looking at the trends of our members and stakeholder expectations to produce the best possible outcomes for the company. Lets get started!

Scenario

I am a Junior Data Analyst working on behalf of the Marketing Analyst team at Cyclistic, which is a bike-share company in the City of Chicago that offers over 6,000 bikes and 700 docking stations to all able patrons around the city. I will be looking to maximize the number of annual memberships, per the request of the Director of Marketing. So, our team will be looking to analyze the tendencies and preferences between Casual riders and annual membership riders. From these insights, our team will develop a plan to convert casual riders into annual members based on the data insights and professional visualizations I will be putting forth.

Ask

In this stage, I have been assigned to dive deeper into a question proposed to me by the Marketing team at Cyclistic to answer. The question is: “How do annual members and casual riders use Cyclistic bikes differently?” I will be delivering a clear statement of the business task with business insights into the problem I am trying to solve. I will be investigating the habits of casual members vs. annual member riders, and then proposing a recommendation on how to move forward with an effective marketing campaign to turn casual riders into annual membership riders.

Prepare

In this stage, I will formally be diving into the data to answer key questions on the data’s location and how it will be organized. I will also be exploring the data’s credibility, licensing, security, and integrity. The data location can be viewed here. I will be analyzing the Cyclistic bike trip data for Q1 of 2023, from January 2023 to March of 2023. The data is located on divvy trip’s website, and I am able to download each dataset by each one’s month, quarter, and year to Microsoft Excel. When looking at the credibility criteria of the data, we will look at the ROCCC, which is an acronym that represents the validity of the data. Specifically, the data is Reliable because it is public, Original because it was posted by the bike company, Comprehensive because it provides many different sets of data from different time periods, Current because it was collected within the past three years, and Cited as it is on the company’s website. For licensing, Lyft bikes and Scooters, LLC runs the Divvy bike service in the City of Chicago. This data is permitted to be open to the public. Lastly, when it comes to the integrity of the data, there are nulls in the data set, which will have to be cleaned in the coming stages.

Process

In this stage, I will be downloading the data for Quarter 1 of 2023, which will be the months of January, February, and March. I will be combining the sets, setting it up for cleaning, then it will be ready for the Analysis stage. I will be importing three sets of data, ranging from January 2023 to March 2023. Then, I will be cleaning and manipulating the data through R studio.

Data Combination

I will be combining the three tables from January, Febuary and March 2023 into R Studio into a subfolder called “Q1 2023 bike set.”

Then, I combined the data together using the function in R called the “rbind.”

Next, I cleaned the data, installing cleaning packages and removing blank data from each of the data sets. Specifically, I will be removing the NULL data sets from each file.

Share/visualizations

In this stage, I will be exporting three csv. files from Excel to Tableau. First, I will be looking at the preferences between casual riders and annual members.

Total amount of casual riders vs. annual members per month

In the chart above, we can see that riders who are members outnumber casual riders at a significant rate. In January of 2023, 136,912 riders were annual members, versus 44,894 casual members. In February of 2023, there were 150,293 annual members, versus 40,008 casual riders. In March of 2023, there were 147,429 annual members, versus 43,016 casual riders. Lastly, the averages between the three months for annual members was 144,878 riders, versus 42,639 casual riders over the three month time span.

This data set tells us that over the span of January 2023 to March 2023, 77 percent of riders who chose Cyclistic bikes were annual members, with only 23 percent of riders being casual riders.

Preffered bike type between Casual riders vs. Annual members per month

In the chart above, I exported the three files from January 2023 to March 2023 to look at the preference of choosing to ride a classic bike, a docked bike, and a docked bike between annual members and casual riders. The main observations and takeaways from the graph above is that in January 2023, members slightly preferred electric bikes over classic bikes, with a margin of 15,516 riders. Another key takeaway is that casual riders in January through March preferred riding electric bikes over classic bikes by a small margin. When looking at docked bikes, it appears that casual riders prefer dock bikes, even though a miniscule amount of riders choose dock bikes. Cyclistic may not have a great demand for docked bikes.

Casual riders vs. annual members by week (January 2023)
Casual riders vs. Annual Members by week (February 2023)
Casual riders vs. Annual members by week (March 2023)

The data from the three charts represents the amount of casual riders vs. Annual riders per week. Across all three charts, the first two weeks of each month seem to be when most Annual riders prefer to ride and go out. The Same is true with Casual riders. The end of the month is on average a slow time for biking.

Act

In this stage, I will be forming conclusions based on the data from Quarter 1 of 2023 and offer my recommendations based on our findings.

  • In Quarter 1 of 2023, Annual and casual riders preferred to ride during the first two weeks of the month, more so than the last two weeks of the month.
  • During the day, Annual member’s preferred to ride at 5 PM during the week, which suggests that they are biking home from work.

Recommendations

After further review of the data, the Data Analytics team would like to propose the following action steps to take place within Cyclistic in order to increase the conversion rate of casual riders to annual members:

  1. Provide free guest passes to casual members and offer discounts for signing up to be an Annual member in the last two weeks of the month, where there are fewer riders. Quarter 1 is also the coldest time of the year and has the least amount of riders, so offering special promotions and discounts for signing up could incentivize casual riders.
  2. Promote the Annual membership as an eco friendly form of transportation to and from work during the week.
  3. Offer a greater discount for choosing to ride docked bikes; so Cyclistic can increase the amount of docked bikes used
  4. Cyclistic can survey people who live in the City of Chicago on their preferred bike to ride to and from work, and a separate survey for their preferred bike type on the weekend. This can assist Cyclistic’s Marketing team in finding a potential alternate to a docked bike

Thank you for reading my Capstone Project!

Reference

--

--

Johnnie Omalley
Google Data Analytics Capstone Project: Cyclistic Case Study

Hi everyone - My name is John O'Malley and I am looking to start a career in the field of Data Analytics, hopefully in the Nonprofit Sector.