Q#71: The Weather Report

Weather forecast

Suppose you have the following dataset, which contains information about a year’s worth of weather. Using Python (Pandas), create some quick plots to show the following:

  • The median temperature by month
  • The median wind speed by month
  • The snowiest months (Hint: this one will require manipulating and classifying the existing data.)

TRY IT YOURSELF

https://colab.research.google.com/drive/1PpfIw1SdZeppauQfUdIZlGNJBDwr2BWc?usp=sharing

ANSWER

This question tests your familiarity with Pandas in python and some simple dataframe manipulation and plotting.

For additional pizzaz, we are going to answer each portion with a single line of code using Pandas chaining methods and built-in plot capabilities.

The first step, as usual, is to get the data into a dataframe using the .read_csv() function.

import pandas as pd
data = pd.read_csv('https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/Datasets/weather_2012.csv'

Next, we will do a few preprocessing steps, including converting the Date column to a Pandas datetime object, using .to_datetime(), and create the Month column from that column, using its attributes.

# Convert Columns to DateTime
data['Date/Time'] = pd.to_datetime(data['Date/Time'])

# Create Month Column
data['Month'] = data['Date/Time'].dt.month

Now, we are ready to answer the first question. To get the median temperature by month, we will use the .groupby() method on the ‘Month’ column then select the temperature column and chain the .medium() function. Finally, to fully appreciate the power of chaining functions in Pandas, we can utilize the .plot() function at the end.

# The median temperature by month
data.groupby('Month')['Temp (C)'].median().plot()

Note: We can make it more fancy by utilizing plot arguments to set titles, x and y labels, etc.

Alright, now onto question two, which is very similar to the first, but this time it wants us to plot a different column. We will use the same .groupby() and .median() structure, but for the .plot() we will add the extra argument kind = ‘bar’ to make it look different.

# The median wind speed by month
data.groupby('Month')['Wind Spd (km/h)'].median().plot(kind='bar')

And finally the more complicated question, a plot of the snowiest months. This one is up to interpretation, but lets treat it as just a count of the number of times in a month that the weather event ‘Snow’ occurs. Unfortunately for us there are many different forms of this event and it can be combined with other weather events so we cannot just use a straightforward filter. Instead, we will use the special string filter method built into Pandas, .str.contains(). This allows use to look for instances of ‘Snow’. So, recall to filter rows in Pandas with a conditional statement, we can use the .loc() method. After the filter, we can utilize the .groupby() on the Months then select the Weather column and take a .count() to get the number of Snow weather events by month. Finally, lets plot this as a horizontal bar chart with the .plot(kind = ‘barh’).

# The snowiest months
data.loc[data['Weather'].str.contains('Snow')].groupby('Month').Weather.count().plot(kind = 'barh')

Note: We could have also sorted the results, with the simple .sort_values() chained function before the .plot() method.

--

--