Performing Analysis of Meteorological Data

Abdullah Abdul Wahid
6 min readMar 25, 2022

The Hypothesis

The Null Hypothesis presented is “Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”. The Question in place is whether the data collected shows an increase in temperature due to Global Warming.

We will be investigating the above mentioned hypothesis using the following dataset:

The dataset “Weather Data.csv” has hourly temperature recorded for last 10 years starting from 2006–04–01 00:00:00.000 +0200 to 2016–09–09 23:00:00.000 +0200. It corresponds to Finland, a country in the Northern Europe.

Data Analysis

Displaying the data shows the following:

Displaying the initial data

Below are some more characteristics of the data.

Description of data
More information of the data
Data sorted by date

Data Cleaning & Preparation

From Initial Analysis, it is clear that there are some data columns which are not useful for further detailed analysis.

Displaying the initial data

I decided to drop the columns “Daily Summary”, and “Loud Cover”.

It seems that the “Summary” column is providing enough information for the Weather of the Day, so “Daily Summary” column seems redundant, and should be dropped.

As for the “Loud Cover” column, It has all values as 0, and so it serves no meaningful purpose to the data, so it will be dropped as well.

Data after dropping the desired columns

Now the data is being analysed to check for any missing values from the entries.

As you can see below, the column “Precip Type” has 95936 entries, compared to all other columns which have 96453 entries.

Searching for columns with missing values
Entries with missing values

As seen above, the column “Precip Type” has missing values for 517 entries.

We need to fill the missing values with an appropriate string. For this purpose, I have used the string ‘Not Defined’.

After filling missing values with the string “Not Defined”.

Now we can see that our data is perfectly filled with (96453 x 10), and has no columns with null (missing) values.

Now we will resample the data from hourly to monthly entries.

Initial state of “Formatted Date” column.

As it can be seen below, the data is converted into monthly entries (133 entries corresponding to all months of this dataset).

Notice that there are now 2 columns which are missing from overall data i.e. “Summary”, and “Precip Type”.

Data Visualization

Time for detailed analysis phase, with the use of visualization!

I have used Seaborn package for most of my visualization, mostly because it has many more customization options as compared to Matplotlib, and personally, it feels easier to implement various types of plots.

Plot 1: Comparision of Apparent Temperature vs. Humidity, using a relational plot with the kind “scatter”.

This plot shows the trend such that when the “Apparent Temperature” values increase, “Humidity” values tend to decrease.

Plot 2: Comparision of Humidity vs. Wind Speed (km/h) using a relational plot with the kind “line”.

According to scientific research, The higher wind speed causes minimum evaporation of water, and low humidity, or vice versa. We can see the trend being proven in this plot above.

Plot 3: Comparision of Apparent Temperature vs. Humidity with respect to “Summary” of the weather on a given day using Pair Plots.

This graph shows the variation between the relationship, based on the summary of the weather on any given day.

Plot 4: Comparision of Apparent Temperature vs. Humidity with respect to “Summary” of the weather on a given day using Relational Plots with columns separated as “Summary”.

This graph shows the variation between the relationship, based on the summary of the weather on any given day, separated into individual columns.

Plot 5: Comparision of Apparent Temperature vs. Humidity for All Months of 10 Years

The above graph shows the variation of Apparent Temperature and Humidity over the course of 10 years.

It can be seen that Apparent Temperature has varied alot through each year, corresponding to seasons, but it has stayed at almost the same peak throughout the years.

However, it can also be seen that Humidity has stayed almost constant throughout the years.

This will be helpful in driving a conclusion towards the end.

Now, classifying the values for a specific month of April

Plot 6: Apparent Temperature and Humidity for the month of April

This plot shows the variation of Apparent Temperature and Humidity, for the month of April, across the same month for all 10 years.

Classifying the data into month-wise Humidity as follows:

Plot 7: Variation of Humidity for all Months of the Year

This graph shows the variation of Humidity across all months of all 10 years.

Classifying the data into month-wise Apparent Temperature as follows:

Plot 8: Variation of Apparent Temperature for all Months of the Year

This graph shows the variation of Apparent Temperature across all months of all 10 years.

Drawing Conclusion

According to our analysis, there has been a very minimal variation in Humidity during the last ten years, notably for the month of April (2006–2016).

As for the Apparent Temperature, there has been an extensive amount of variation over these years, but the peak of the temperature is relatively the same after 10 years, or slightly higher in some cases.

Also, there seems to be no real connection between Apparent Temperature and Humidity.

Overall, both the parameters have stayed relatively the same over the course of 10 years, and so Global Warming does not seem to have affected either of these values.

That is it, signing off!

Note: Click here for the solution and dataset on my GitHub and kindly visit my Linkedin profile.

--

--

Abdullah Abdul Wahid

An Aspiring Enthusiast in the Fascinating Field of Data Science. Eager to Learn, Achieve and Grow!