Ford GoBike System Data project

sammy ilesanmi
4 min readAug 20, 2022

--

This project is my final ALX-T Udacity project and I will share with you my Exploratory analysis and Explanation analysis of the bike-sharing system covering the greater San Francisco Bay area. This data set includes information about individual rides made.

Data Wrangling

Following the Data Wrangling method, I started by Gathering the dataset from its source. The dataset was limited to just SanFransisco though I could also get for other cities. The dataset is gotten from here.

In the Assess stage of the data wrangling steps, I noted some rows and columns where the data lacked accuracy, completeness, consistency, and reliability. As the dataset is meant to be of quality.

Some of which are:

The method taken to clean this data is the Define, Code, and Test. That is I start by stating the error in the dataset, then I code the correction, and finally, Test if it's corrected.

This method is best for me as it monitors every error and progress taken to correct them.

Exploration

After the cleaning process, the next is to explore and see what relationships and insights I could generate from the data. Some of the insights required additional columns which had the daytime which rides were taken most.

The plots of my exploration are split into Univariate, Bivariate, and Multivariate plots.

For the Univariate, I saw:

  1. What gender made use of the service most
  2. What time of the day were the rides mostly taken and so on

For the Bivariate, I saw:

  1. What relationship is between the members_age and the duration which they ride
  2. What station is mainly used and what time of the day are they mostly used

For the Multivariate, I saw:

  1. The rides for the top 10 stations using a FacetGrid

In total, I made over 18 exploration plots to see the relationships that exist and generate insights from this data. I saved the cleaned and explored data after I was satisfied with all the exploration made. Next is the Explanation!

Explanation

The Explanation analysis was done on a different notebook and only selected insights were shared, some of which are:

  1. The gender that made use of the service mostly
  2. What time of the day do we have most rides
  3. What particular hour of the day do we have most rides

These were done amongst others.

The insights generated are shown thus:

What hour of the day was most rides?

As seen in the visuals, the Male gender makes use of the cycling services the most.

Also, the morning and the evening are when people use the services. This could hint to us that majority of the population are employed or students.

To add to that, we see the hours in which most of these rides are made.

In conclusion, this project is my capstone Data Analytics course by ALX-T Udacity. It’s been a beautiful learning experience.

You can check my GitHub for the whole analysis

And you can follow me up on LinkedIn

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response