Meteorological Data Analysis using Python

--

Data analysis can be described as a process consisting of several steps in which the raw data are transformed and processed in order to produce data visualizations and make predictions. Data analysis is schematized as a process chain consisting of the following sequence of stages:

Null Hypothesis means we need to find whether the average Apparent temperature for the month of say April starting from 2006 to 2016 and the average humidity for the same period have increased or not.

Dataset: https://www.kaggle.com/muthuj7/weather-dataset

Code:

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
df = pd.read_csv('weatherHistory.csv')
df.head()
df.shape
df.dtypes
df.describe()
df['Formatted Date'] = pd.to_datetime(df['Formatted Date'],utc= True)
df['Formatted Date']
# We will check if any Nan value is present or not?
df.isnull().sum()
# First we set Formatted Date as our index, because we need every year data of a specific month.
df = df.set_index('Formatted Date')
df.head()
# We need only 2 columns from our Dataset. so will fetched that columns only.
cols = df[['Apparent Temperature (C)','Humidity']]
df_avg = cols.resample('MS').mean()
# "MS" ==> Month starting
# We are actually displaying the avg apparent temperature (C) and humidity using mean()
df_avg.head()
# We will plot both value with respect to datetime i.e. index column to check variation in data.
plt.figure(figsize=(12,7))
plt.title("Changes in Apparent Temperature (C) and Humidity with time")
sns.lineplot(data=df_avg);
# Now we will find particular month == 'April' as per question.
# we will extract only April month data using index.month == 4.
data = df_avg[df_avg.index.month==4]
data
import matplotlib.dates as dts# Now we will subplot the 2 variable with respect to index value.

fig, st = plt.subplots(figsize = (13,7))

st.plot(data.loc['2006-04-01':'2016-04-01', 'Apparent Temperature (C)'],marker='o', linestyle='-',label='Apparent Temperature (C)');
st.plot(data.loc['2006-04-01':'2016-04-01', 'Humidity'],marker='o', linestyle='-',label='Humidity');
st.set_xticks(['04-01-2006','04-01-2007','04-01-2008','04-01-2009',
'04-01-2010','04-01-2011','04-01-2012','04-01-2013','04-01-2014','04-01-2015','04-01-2016'])
st.xaxis.set_major_formatter(dts.DateFormatter('%d %m %Y'))
st.set_xlabel('Month of April')
st.legend(loc = 'center right')
plt.title("Yearwise Variation in Apparent Temperature (C) and Humidity for the Month of April");

Fig. 1: Screenshot of the Plot of Changes in Apparent Temp and Humidity with time

Fig. 2: Screenshot of Plot between Year-wise Variation in Apparent Temp and Humidity in April month

Final Conclusion : -

The Average Apparent Temperature increase for year 2008–09 and then again decrease from 2009–10 to its average level. Then slighlty increase for period 2010–11 and then drop again for year 2011–12. For year 2014–15 it decrease drastically and then come to average level for year 2015–16.

It observed, for year 2009 the Apparent Temperature is at Highest 14.26(C) and for year 2015 the Apparent Temperature is at lowest 10.63(C).

There is no any change in average Humidity for given year 2006–2016. The humidity graph line is approximately parallel to the X-axis.

Reference

Final Note :

Thanks for reading! If you liked this article, please hit the clap👏 button as many times as you can. It would mean a lot and encourage me to keep sharing my knowledge.

--

--