Q#105: Foggiest months

Understanding weather patterns is crucial for a variety of applications, from agriculture to event planning. In this blog post, we’ll explore how to calculate the percentage of time it was raining each month using a weather dataset. We will use Python and its powerful data manipulation libraries to achieve this.

Dataset Overview

The dataset we are using contains weather information for the year 2012. It includes various weather metrics recorded at regular intervals. Here’s how we can import and preview the dataset:

# Import libraries
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Import data
weather_2012 = pd.read_csv('https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/Datasets/weather_2012.csv', parse_dates=True, index_col='Date/Time')
# Preview data
weather_2012.head()

Data Structure

The dataset contains several columns, but for our purposes, we are primarily interested in columns that can help us determine whether it was raining.

Step-by-Step Solution

Step 1: Data Preparation

First, let’s inspect the dataset to understand its structure and identify the relevant columns.

# Inspect the data
print(weather_2012.info())
print(weather_2012.head())

From the initial inspection, we see columns like Weather that provide descriptions of the weather conditions. We will use this column to classify if it was raining.

Step 2: Classify Rainy Periods

We’ll create a new column to classify whether it was raining based on the descriptions in the Weather column. Descriptions containing keywords like "Rain" or "Drizzle" indicate rainy periods.

# Classify rainy periods
weather_2012['is_raining'] = weather_2012['Weather'].str.contains('Rain|Drizzle', case=False, na=False)

Step 3: Resample Data to Monthly Frequency

To calculate the percentage of time it was raining each month, we need to aggregate the data by month.

# Resample data to monthly frequency
monthly_rain = weather_2012['is_raining'].resample('M').mean() * 100

Here, resampling with ‘M’ means we are grouping the data by month. Taking the mean of the is_raining column will give us the fraction of time it was raining, which we then convert to a percentage.

Step 4: Visualize the Results

Finally, let’s visualize the monthly rain percentages to better understand the data.

# Plot the percentage of time it was raining each month
monthly_rain.plot(kind='bar', figsize=(10, 6), color='skyblue')
plt.title('Percentage of Time it was Raining Each Month in 2012')
plt.xlabel('Month')
plt.ylabel('Percentage of Time Raining')
plt.xticks(rotation=45)
plt.show()

Complete Code

Here is the complete code, including all steps from data preparation to visualization:

# Import libraries
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Import data
weather_2012 = pd.read_csv('https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/Datasets/weather_2012.csv', parse_dates=True, index_col='Date/Time')
# Classify rainy periods
weather_2012['is_raining'] = weather_2012['Weather'].str.contains('Rain|Drizzle', case=False, na=False)
# Resample data to monthly frequency
monthly_rain = weather_2012['is_raining'].resample('M').mean() * 100
# Plot the percentage of time it was raining each month
monthly_rain.plot(kind='bar', figsize=(10, 6), color='skyblue')
plt.title('Percentage of Time it was Raining Each Month in 2012')
plt.xlabel('Month')
plt.ylabel('Percentage of Time Raining')
plt.xticks(rotation=45)
plt.show()

Plug: Checkout all my digital products on Gumroad here. Please purchase ONLY if you have the means to do so. Use code: MEDSUB to get a 10% discount!

Earn $25 and 4.60% APY for FREE through my referral at SoFi Bank Here

--

--