Google Data Analytics Capstone: Case Study 1 (Using RStudio & Tableau)

Cheryel Huang
7 min readSep 5, 2023

--

This article depicts the logical frames I proposed for this case study, the data analysis tool I used, the analysing train of thoughts I employed to solve the case, the case insight, and a list of my suggestions for this case.

Content: 
- Case info
- Logical Framework
- Data Preparation
- Data Cleaning
- Data Analysis & Data Visualisation
- Conclusion
- Suggestions

Case info

Background

I am a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. According to the conclusion made by Cyclistic’s finance analysts, annual members are much more profitable than casual riders, and the pricing flexibility helps Cyclistic attract more customers.

Therefore, Moreno (Marketing director) believes that maximising the number of annual members will be key to future growth, and there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic programme and have chosen Cyclistic for their mobility needs.

Goal

Design marketing strategies aimed at converting casual riders into annual members

About the company

In 2016, Cyclistic launched a successful bike-share offering in Chicago. Since then, the programme has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Current marketing strategy

Relied on building general awareness and appealing to broad consumer segments. It helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships.

Terms

  • Casual riders: Customers who purchase single-ride or full-day passes.
  • Annual members: Customers who purchase annual memberships are Cyclistic members.

Data Analysis

Logical Framework

Based on the goal of this case and insights from Moreno(Marketing Director), below are the logical framework to focus on during analysis.

  1. Clycling frequency for member & casual users

(# of user & # of ride)

[Logic] What: Focus on the trend of casual users’ cycling behaviour

2. Cycling time for member & casual users

(the time user cycle, the day of week user cycle & trip duration)

[Logic] When: Focus on casual users’ cycling habit to release incentive programs

3. Cycling preference for member & casual users

(rideable type & cycling area)

[Logic] What: Focus on communicating with casual users with their preference through digital media to persuade them to become annual members

Data preparation

To understand users’ behaviour within the recent period, and to do some user behaviour comparison according to the time later in the analysis process, we will extract Cyclistic’s historic user data from 2020–2023/06 (here). All of these files will be saved in the folder“Cyclistic_tripdata” to organise all files. While since the file names are already clear enough to distinguish the data created month, we will keep file names as they are for better recognise extracted files from the ones in the database by internal teams.

Data cleaning

Total records of 2020–2023/06 historical user files are 17,194,922 which are too large to be processed by spreadsheet, therefore RStudio would be the analysis tool applied in this case study.

To start the data cleaning within Rstudio, below packages should be installed and loaded first: tidyverse, lubdridate, dplyr & janitor.

Notes: wording after # is the explanation for the steps

#Installing the packages
install.packages('tidyverse')
install.packages('lubridate')
install.packages('dplyr')
install.packages('janitor')

#Loading the packages
library(tidyverse)
library(lubridate)
library(dplyr)
library(janitor)

After installing the R packages, we would begin importing the csvs into Rstudio through read_csv. Meanwhile, I would name the imported csvs to improve readability.

Once all data frames are imported, we should check the structures of them to identify the data type and information included in the files by applying str().

We would get the information as follows.

In this step, all data frames should be with identical column names & formatting types, and confirm that if there are any wrongly formatted data types.

Regarding data type of started_at & ended_at, since data is already presented in %Y-%m-%d %H:%M:%S format, though the data is not in time type (e.g., POSIXct and POSIXlt classes), it could still be recognised as timestamps in Tableau, therefore we would leave the data type as it is.

After reviewing the data structure, we’re ready to merge all datasets into one table by using bind_rows , and I will name it as tripdata_yyyy.

Now let’s take a quick glance at the data structure of our combined data frames. To better deep dive users’ using time later, we could use existing raw data to extract the day of the week, month of started and ended time, start hour, end hour.

  • wday() yields the week of the day
  • format(as.POSIXct) yields the started_hour & ended_hour
  • format(as.Date) yields the month
  • difftime() yields the Trip Duration

Now data cleaning is almost completed. It’s time to export cleaned files.

Data analysis & Data visualisation

Once completed data cleaning and prepared required data for data analysis, we back to observe and categorise data for its content into groups for better analysis.

  • ride_id, rideable_type & member_casual (Character): user number, user ride preference & user identity related
  • started_at, started_hour, started_month, ended_at, ended_hour, (Timestamp): user cycle period related
  • weekday (Ordered factor): user cycle frequency related
  • difftime(Numbers): user cycle period relater
  • start_station_name, start_station_id, end_station_name, & end_station_id(Character): user cycle distance related
  • started_lat, started_lng, ended_lat & ended_lng (Numbers): user cycle geography related

Circle back to logical framework set up earlier and start deep dive in each part:

  1. Clycling frequency for member & casual users

(user# & ride#)

[Logic] What: Focus on the trend of casual users’ cycling behaviour.

User number

Current marketing plan could not encourage customers purchase either membership or casual user package, require incentive program to stimulate user#

Ride#

Members ride more frequently than casual members, whereas casual users are made up of residents and visitors who ride primarily on weekends but also on weekdays.

2. Cycling time for member & casual users

(the time user cycle, the day of week user cycle & trip duration)

[Logic] When: Focus on casual users’ cycling habit to release incentive programs

AVG trip duration

The average travel duration of casual users is 1.5 times that of members’.

Trip in day of the week

Casual users prefer to ride bikes on weekends, while members tend to cycle on weekdays.

Trip in hour of the day

Casual users ride bikes between the afternoon and dinner while members generally cycle at commute time .

Trip in the month

When weather is in extreme situation (HOT-sun burn & COLD-road with snow), users don’t ride bikes (casual+member).

3. Cycling preference for member & casual users

(cycling area & rideable type)

[Logic] What: Focus on communicating with casual users with their preference through digital media to persuade them to become annual members.

Trip area

Casual users ride bike for sight seeing , member use bike for commute.

Rideable type

Casual users ride electric bike to relax on their way, member ride classic bike to prevent risk happen (no electricity & no system malfunction concerns).

Conclusion

Casual users are most likely a mix of locals and tourists who ride their bikes from afternoon to dinner and on weekends. They prefer to ride electronic bikes to their destination and enjoy spending a long time riding bikes.

Members are most likely local employees who ride classic bikes between work and home in the morning and after work to avoid the hazards that electric bikes and docked bikes pose in making them late for work. As a result, they have a short travel time yet a high frequency.

Though the user behaviour of casual users and members differ significantly, they both do not ride in severe weather. For example, hot waves and frosty days. Furthermore, the user growth rate has steadily declined since 2021, indicating that the marketing strategies implemented between 2021 and 2023 were ineffective.

Suggestions

  1. Release a reward program of registering an account and purchasing membership at both the website & app to stimulate more users to use the cycling service.
  2. Feed promo info on both social media and the app at commute time on weekdays to encourage/remind casual users to ride the bike for their health and high mobility instead of taking other transportation.
  3. Introduce another annual membership level between existing membership and a one-day pass to motivate casual members to become members. (The quality of the package content should be lower than current memberships, but much higher than day passes)
  4. Launch seasonal and holiday cycling campaigns to incentive casual users purchase the membership for the bargain price & premier package.
  5. Collaborate with food and beverage stands and set them up near the most popular kiosks to attract casual riders and members to ride bikes.

--

--

Cheryel Huang
0 Followers

Marketing background Data Analyst with project management experience. Linkedin: https://www.linkedin.com/in/hsiao-chieh-huang/