Q#105: Foggiest months
Understanding weather patterns is crucial for a variety of applications, from agriculture to event planning. In this blog post, we’ll explore how to calculate the percentage of time it was raining each month using a weather dataset. We will use Python and its powerful data manipulation libraries to achieve this.
Dataset Overview
The dataset we are using contains weather information for the year 2012. It includes various weather metrics recorded at regular intervals. Here’s how we can import and preview the dataset:
# Import libraries
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Import data
weather_2012 = pd.read_csv('https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/Datasets/weather_2012.csv', parse_dates=True, index_col='Date/Time')
# Preview data
weather_2012.head()
Data Structure
The dataset contains several columns, but for our purposes, we are primarily interested in columns that can help us determine whether it was raining.
Step-by-Step Solution
Step 1: Data Preparation
First, let’s inspect the dataset to understand its structure and identify the relevant columns.
# Inspect the data
print(weather_2012.info())
print(weather_2012.head())
From the initial inspection, we see columns like Weather
that provide descriptions of the weather conditions. We will use this column to classify if it was raining.
Step 2: Classify Rainy Periods
We’ll create a new column to classify whether it was raining based on the descriptions in the Weather
column. Descriptions containing keywords like "Rain" or "Drizzle" indicate rainy periods.
# Classify rainy periods
weather_2012['is_raining'] = weather_2012['Weather'].str.contains('Rain|Drizzle', case=False, na=False)
Step 3: Resample Data to Monthly Frequency
To calculate the percentage of time it was raining each month, we need to aggregate the data by month.
# Resample data to monthly frequency
monthly_rain = weather_2012['is_raining'].resample('M').mean() * 100
Here, resampling with ‘M’ means we are grouping the data by month. Taking the mean of the is_raining
column will give us the fraction of time it was raining, which we then convert to a percentage.
Step 4: Visualize the Results
Finally, let’s visualize the monthly rain percentages to better understand the data.
# Plot the percentage of time it was raining each month
monthly_rain.plot(kind='bar', figsize=(10, 6), color='skyblue')
plt.title('Percentage of Time it was Raining Each Month in 2012')
plt.xlabel('Month')
plt.ylabel('Percentage of Time Raining')
plt.xticks(rotation=45)
plt.show()
Complete Code
Here is the complete code, including all steps from data preparation to visualization:
# Import libraries
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Import data
weather_2012 = pd.read_csv('https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/Datasets/weather_2012.csv', parse_dates=True, index_col='Date/Time')
# Classify rainy periods
weather_2012['is_raining'] = weather_2012['Weather'].str.contains('Rain|Drizzle', case=False, na=False)
# Resample data to monthly frequency
monthly_rain = weather_2012['is_raining'].resample('M').mean() * 100
# Plot the percentage of time it was raining each month
monthly_rain.plot(kind='bar', figsize=(10, 6), color='skyblue')
plt.title('Percentage of Time it was Raining Each Month in 2012')
plt.xlabel('Month')
plt.ylabel('Percentage of Time Raining')
plt.xticks(rotation=45)
plt.show()
Plug: Checkout all my digital products on Gumroad here. Please purchase ONLY if you have the means to do so. Use code: MEDSUB to get a 10% discount!
Earn $25 and 4.60% APY for FREE through my referral at SoFi Bank Here