Quick Data Analysis with Python: Working With Dates

Mujeeb Lawal
The Startup
Published in
4 min readFeb 2, 2021

I’ll be honest with you, maybe this is another one of those Python Data Analysis blog posts. Perhaps there’s one difference, I want to do this quick and “easy”, also I’ll be working with dates.

I’ll be analyzing AirBnB listings data for Seattle, WA 2016. The data shows which listings are available and filled on each day.

First, import necessary libraries. Libraries are a bunch of tools that very nice people have kindly put together and made available to us.

import numpy as np
import pandas as pd
import seaborn as sns

The second tasks is to import your file, mine is in csv format. Here’s a popular way to do that.

calendar = pd.read_csv('./calendar.csv')

The third task is to check the format or data types in your file and maybe convert them to the appropriate data type. Here’s a quick way to do that.

calendar.dtypes
Fig 2: Calendar Data Types

The date column is not in my preferred type, here’s a way to convert it.

calendar['date'] =  pd.to_datetime(calendar['date'], format='%Y-%m-%d')

Here’s a way to take a quick look at my data. I’ll look at the first 5 rows.

calendar.head()
Fig 3: First 5 columns in calendar data frame

I’m interested in the day of the week, month and year of each date. Here’s a way to extract them.

calendar['Month'] = calendar['date'].dt.month
calendar['Year'] = calendar['date'].dt.year
calendar['Day'] = calendar['date'].dt.weekday
Fig 4: New columns for Month, Year and Weekday

I’ll prefer if months and days are in words, makes for better interpretation. Here’s a way to do that using a dictionary and the “replace” function.

Months = {1:'Jan',2:'Feb',3:'Mar', 4:'Apr', 5:'May', 6:'Jun', 7:'Jul', 8:'Aug', 9:'Sep', 10:'Oct', 11:'Nov', 12:'Dec'}
Weekday = {0:"Mon", 1:"Tue", 2:"Wed", 3:"Thu", 4:"Fri", 5:"Sat", 6:"Sun"}
calendar['Day'].replace(Weekday, inplace=True)
calendar['Month'].replace(Months, inplace=True)
Fig 5: Here’s how the table looks now, nice!

Great! Let’s start analyzing our data, there’s a few simple ways to do that.

First question is to check if total number of listings are similar for weekdays and months. The data is for a year (365 days) so we don’t need to look at that.

sns.countplot(calendar.Day)
Fig 6: About 200,000 entries for each week day
sns.countplot(calendar.Month)
Fig 7: About 120,000 entries for each month

Fig. 6 and Fig. 7 shows number of listings is similar for each day at about 200,000 and for months at about 120,000.

We can go a step deeper, let’s analyze the “available” column which indicates whether a listing for a specific day is already occupied or not. We want to check which weekdays or months have the highest utilization. Utilization defined as ratio of Occupied listing versus Total listings.

Question: Is Utilization the same for each week day?

First we create a new table where we group entries by “Day” and “available” columns.

WkDayUtil = calendar.groupby(['Day','available']).size().unstack()
Fig 8: Here’s what you get

I’ll replace “f” and “t” with “NotAvail” and “Avail” to represent number of entries not available and available respectively. I’ll also add the Utilization column calculated as shown in the formula below, representing percentage utilization.

WkDayUtil.columns = ['NotAvail','Avail']
WkDayUtil.reset_index(inplace=True)
WkDayUtil['Utilization'] = round((WkDayUtil['NotAvail']/(WkDayUtil['NotAvail']+WkDayUtil['Avail']))*100,1)
WkDayUtil
Fig 9: Here’s what we get

For easy interpretation a plot is nice.

sns.barplot(x="Day", y="Utilization",data=WkDayUtil)
Fig 10: Utilization is similar for each day of the week

Fig. 10 shows Utilization is similar for each day of the week, meaning similar ratio of listings get filled on each day of the week.

Finally, I check Utilization for each month of the year. Similar steps as before, reproduced as shown in the code block below.

MonthUtil = calendar.groupby(['Month','available']).size().unstack()
MonthUtil.columns = ['NotAvail','Avail']
MonthUtil.reset_index(inplace=True)
MonthUtil['Utilization'] = round((MonthUtil['NotAvail']/(MonthUtil['NotAvail']+MonthUtil['Avail']))*100,1)
MonthUtil.head()
Fig 11 : Looks like Utilization is less similar for each month

Here’s a code to plot each month and arrange the months accordingly.

Months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
MonthUtil['Month'] = pd.Categorical(MonthUtil['Month'], categories = Months, ordered= True)
MonthUtil.sort_values(by='Month', inplace=True)
sns.barplot(x="Month", y="Utilization",data=MonthUtil)
Fig 12: January has highest Utilization

Fig. 12 shows January has highest Utilization, meaning more listings are filled in January than any other month of the year. Notice the steady drop in Utilization from July to December.

I hope I have been able to help with your Python data analysis. I hope medium’s algorithm has kindly added a few other helpful links at the bottom of this page. Otherwise, you can always google your way through any further help you need.

--

--