Data Visualization using Python Matplotlib | Towards AI

Tutorial on Data Visualization: Weather Data

Weather data analysis and visualization using Python’s Matplotlib

Benjamin Obi Tayo Ph.D.
Jul 20 · 2 min read

Data Visualization is more of an Art than Science. To produce a good visualization, you need to put several pieces of code together for an excellent end result. This tutorial demonstrates how a good data visualization can be produced by analyzing weather data.

This code performs the following:

  1. It returns a line graph of the record high and records low temperatures by day of the year over the period 2005–2014. The area between the record high and record low temperatures for each day of the year is shaded.
  2. Overlays a scatter of the 2015 data for any points (highs and lows) for which the ten-year record (2005–2014) record high or record low was broken in 2015.

Dataset: The NOAA dataset used for this project is stored in the file weather_data.csv. This data comes from a subset of the National Centers for Environmental Information (NCEI) Daily Global Historical Climatology Network (GHCN-Daily). The GHCN-Daily is comprised of daily climate records from thousands of land surface stations across the globe. The data was collected from data stations near Ann Arbor, Michigan, United States.

The complete code for this article can be downloaded from this repository:

1. Import necessary libraries and dataset

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

2. Data preparation and analysis

#convert temperature from tenths of degree C to degree C
days=list(map(lambda x: x.split('-')[-2]+'-'+x.split('-')[-1], df.Date))
years=list(map(lambda x: x.split('-')[0], df.Date))
df_min = df_2005_to_2014.groupby(['Element','Days']).min()
df_2015_min = df_2015.groupby(['Element','Days']).min()

3. Generate Data Visualization

plt.figure(figsize=(10,7)) plt.plot(np.arange(len(record_max)),record_max, '--k', label="record high") plt.plot(np.arange(len(record_max)),record_min, '-k',label="record low") plt.scatter(np.where(record_2015_min < record_min.values),             record_2015_min[record_2015_min < record_min].values,c='b',label='2015 break low')plt.scatter(np.where(record_2015_max > record_max.values),             record_2015_max[record_2015_max > record_max].values,c='r',label='2015 break high') plt.xlabel('month',size=14) plt.ylabel('temperature($^\circ C$ )',size=14) plt.xticks(np.arange(0,365,31), ['Jan','Feb', 'Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']) ax=plt.gca() ax.axis([0,365,-40,40]) plt.gca().fill_between(np.arange(0,365),record_min, record_max,                   facecolor='blue',alpha=0.25) plt.title('Record temperatures for different months between 2005-2014',size=14) plt.legend(loc=0)

In summary, we’ve shown how a simple data visualization plot can be generated using Python’s Matplotlib library.

The complete code for this article can be downloaded from this repository:

Towards AI

Towards AI, is the world’s fastest-growing AI community for learning, programming, building and implementing AI.

Benjamin Obi Tayo Ph.D.

Written by

Physicist, Data Scientist, Educator, Writer. Interests: Data Science, Machine Learning, AI, Python & R, Predictive Analytics, Materials Science, Bioinformatics

Towards AI

Towards AI, is the world’s fastest-growing AI community for learning, programming, building and implementing AI.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade