Quick Data Analysis with Python: Working With Dates
I’ll be honest with you, maybe this is another one of those Python Data Analysis blog posts. Perhaps there’s one difference, I want to do this quick and “easy”, also I’ll be working with dates.
I’ll be analyzing AirBnB listings data for Seattle, WA 2016. The data shows which listings are available and filled on each day.
First, import necessary libraries. Libraries are a bunch of tools that very nice people have kindly put together and made available to us.
import numpy as np
import pandas as pd
import seaborn as sns
The second tasks is to import your file, mine is in csv format. Here’s a popular way to do that.
calendar = pd.read_csv('./calendar.csv')
The third task is to check the format or data types in your file and maybe convert them to the appropriate data type. Here’s a quick way to do that.
calendar.dtypes
The date column is not in my preferred type, here’s a way to convert it.
calendar['date'] = pd.to_datetime(calendar['date'], format='%Y-%m-%d')
Here’s a way to take a quick look at my data. I’ll look at the first 5 rows.
calendar.head()
I’m interested in the day of the week, month and year of each date. Here’s a way to extract them.
calendar['Month'] = calendar['date'].dt.month
calendar['Year'] = calendar['date'].dt.year
calendar['Day'] = calendar['date'].dt.weekday
I’ll prefer if months and days are in words, makes for better interpretation. Here’s a way to do that using a dictionary and the “replace” function.
Months = {1:'Jan',2:'Feb',3:'Mar', 4:'Apr', 5:'May', 6:'Jun', 7:'Jul', 8:'Aug', 9:'Sep', 10:'Oct', 11:'Nov', 12:'Dec'}
Weekday = {0:"Mon", 1:"Tue", 2:"Wed", 3:"Thu", 4:"Fri", 5:"Sat", 6:"Sun"}
calendar['Day'].replace(Weekday, inplace=True)
calendar['Month'].replace(Months, inplace=True)
Great! Let’s start analyzing our data, there’s a few simple ways to do that.
First question is to check if total number of listings are similar for weekdays and months. The data is for a year (365 days) so we don’t need to look at that.
sns.countplot(calendar.Day)
sns.countplot(calendar.Month)
Fig. 6 and Fig. 7 shows number of listings is similar for each day at about 200,000 and for months at about 120,000.
We can go a step deeper, let’s analyze the “available” column which indicates whether a listing for a specific day is already occupied or not. We want to check which weekdays or months have the highest utilization. Utilization defined as ratio of Occupied listing versus Total listings.
Question: Is Utilization the same for each week day?
First we create a new table where we group entries by “Day” and “available” columns.
WkDayUtil = calendar.groupby(['Day','available']).size().unstack()
I’ll replace “f” and “t” with “NotAvail” and “Avail” to represent number of entries not available and available respectively. I’ll also add the Utilization column calculated as shown in the formula below, representing percentage utilization.
WkDayUtil.columns = ['NotAvail','Avail']
WkDayUtil.reset_index(inplace=True)
WkDayUtil['Utilization'] = round((WkDayUtil['NotAvail']/(WkDayUtil['NotAvail']+WkDayUtil['Avail']))*100,1)
WkDayUtil
For easy interpretation a plot is nice.
sns.barplot(x="Day", y="Utilization",data=WkDayUtil)
Fig. 10 shows Utilization is similar for each day of the week, meaning similar ratio of listings get filled on each day of the week.
Finally, I check Utilization for each month of the year. Similar steps as before, reproduced as shown in the code block below.
MonthUtil = calendar.groupby(['Month','available']).size().unstack()
MonthUtil.columns = ['NotAvail','Avail']
MonthUtil.reset_index(inplace=True)
MonthUtil['Utilization'] = round((MonthUtil['NotAvail']/(MonthUtil['NotAvail']+MonthUtil['Avail']))*100,1)
MonthUtil.head()
Here’s a code to plot each month and arrange the months accordingly.
Months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
MonthUtil['Month'] = pd.Categorical(MonthUtil['Month'], categories = Months, ordered= True)
MonthUtil.sort_values(by='Month', inplace=True)
sns.barplot(x="Month", y="Utilization",data=MonthUtil)
Fig. 12 shows January has highest Utilization, meaning more listings are filled in January than any other month of the year. Notice the steady drop in Utilization from July to December.
I hope I have been able to help with your Python data analysis. I hope medium’s algorithm has kindly added a few other helpful links at the bottom of this page. Otherwise, you can always google your way through any further help you need.