Q#105: Foggiest months

Understanding weather patterns is crucial for a variety of applications, from agriculture to event planning. In this blog post, we’ll explore how to calculate the percentage of time it was raining each month using a weather dataset. We will use Python and its powerful data manipulation libraries to achieve this.

Dataset Overview

The dataset we are using contains weather information for the year 2012. It includes various weather metrics recorded at regular intervals. Here’s how we can import and preview the dataset:

`# Import libraries%matplotlib inlineimport pandas as pdimport matplotlib.pyplot as pltimport numpy as np# Import dataweather_2012 = pd.read_csv('https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/Datasets/weather_2012.csv', parse_dates=True, index_col='Date/Time')# Preview dataweather_2012.head()`

Data Structure

The dataset contains several columns, but for our purposes, we are primarily interested in columns that can help us determine whether it was raining.

Step 1: Data Preparation

First, let’s inspect the dataset to understand its structure and identify the relevant columns.

`# Inspect the dataprint(weather_2012.info())print(weather_2012.head())`

From the initial inspection, we see columns like `Weather` that provide descriptions of the weather conditions. We will use this column to classify if it was raining.

Step 2: Classify Rainy Periods

We’ll create a new column to classify whether it was raining based on the descriptions in the `Weather` column. Descriptions containing keywords like "Rain" or "Drizzle" indicate rainy periods.

`# Classify rainy periodsweather_2012['is_raining'] = weather_2012['Weather'].str.contains('Rain|Drizzle', case=False, na=False)`

Step 3: Resample Data to Monthly Frequency

To calculate the percentage of time it was raining each month, we need to aggregate the data by month.

`# Resample data to monthly frequencymonthly_rain = weather_2012['is_raining'].resample('M').mean() * 100`

Here, resampling with ‘M’ means we are grouping the data by month. Taking the mean of the `is_raining` column will give us the fraction of time it was raining, which we then convert to a percentage.

Step 4: Visualize the Results

Finally, let’s visualize the monthly rain percentages to better understand the data.

`# Plot the percentage of time it was raining each monthmonthly_rain.plot(kind='bar', figsize=(10, 6), color='skyblue')plt.title('Percentage of Time it was Raining Each Month in 2012')plt.xlabel('Month')plt.ylabel('Percentage of Time Raining')plt.xticks(rotation=45)plt.show()`

Complete Code

Here is the complete code, including all steps from data preparation to visualization:

`# Import libraries%matplotlib inlineimport pandas as pdimport matplotlib.pyplot as pltimport numpy as np# Import dataweather_2012 = pd.read_csv('https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/Datasets/weather_2012.csv', parse_dates=True, index_col='Date/Time')# Classify rainy periodsweather_2012['is_raining'] = weather_2012['Weather'].str.contains('Rain|Drizzle', case=False, na=False)# Resample data to monthly frequencymonthly_rain = weather_2012['is_raining'].resample('M').mean() * 100# Plot the percentage of time it was raining each monthmonthly_rain.plot(kind='bar', figsize=(10, 6), color='skyblue')plt.title('Percentage of Time it was Raining Each Month in 2012')plt.xlabel('Month')plt.ylabel('Percentage of Time Raining')plt.xticks(rotation=45)plt.show()`

Plug: Checkout all my digital products on Gumroad here. Please purchase ONLY if you have the means to do so. Use code: MEDSUB to get a 10% discount!

Earn \$25 and 4.60% APY for FREE through my referral at SoFi Bank Here

--

--

Data Science Professional, Python Enthusiast, turned LLM Engineer