Ford GoBike System Data project

4 min readAug 20, 2022

This project is my final ALX-T Udacity project and I will share with you my Exploratory analysis and Explanation analysis of the bike-sharing system covering the greater San Francisco Bay area. This data set includes information about individual rides made.

Data Wrangling

Following the Data Wrangling method, I started by Gathering the dataset from its source. The dataset was limited to just SanFransisco though I could also get for other cities. The dataset is gotten from here.

In the Assess stage of the data wrangling steps, I noted some rows and columns where the data lacked accuracy, completeness, consistency, and reliability. As the dataset is meant to be of quality.

Some of which are:

The method taken to clean this data is the Define, Code, and Test. That is I start by stating the error in the dataset, then I code the correction, and finally, Test if it's corrected.

This method is best for me as it monitors every error and progress taken to correct them.

Exploration

After the cleaning process, the next is to explore and see what relationships and insights I could generate from the data. Some of the insights required additional columns which had the daytime which rides were taken most.

The plots of my exploration are split into Univariate, Bivariate, and Multivariate plots.

For the Univariate, I saw:

What gender made use of the service most
What time of the day were the rides mostly taken and so on

For the Bivariate, I saw:

What relationship is between the members_age and the duration which they ride
What station is mainly used and what time of the day are they mostly used

For the Multivariate, I saw:

The rides for the top 10 stations using a FacetGrid

In total, I made over 18 exploration plots to see the relationships that exist and generate insights from this data. I saved the cleaned and explored data after I was satisfied with all the exploration made. Next is the Explanation!