Working With Temperatures

Published in

The Startup

10 min readAug 14, 2020

One of the hypotheses that the scientific community is working on is the option that the SARS-CoV-2 coronavirus is less transmissible in the presence of a warm and humid climate, a possibility that could reduce the incidence of COVID-19 disease as the spring progresses, the summer months get closer and it becomes warmer. For the time being, this is only a hypothesis, since although there are preliminary studies that point in that direction, there is still not enough scientific evidence to say that the virus survives worse in heat and that the pandemic could be attenuated by the arrival of higher temperatures or a more humid climate.

Some respiratory viruses, such as influenza, are known to spread more during the cold-climate months, and the other known coronavirus generally survives worse in higher temperatures and greater humidity than in colder or drier environments. There are some reasons for the seasonality of viruses in temperate regions, but the information is still lacking as to whether this theory can be applied to the new coronavirus.

Data Overview :

The rising average temperature of Earth’s climate system, called global warming, is driving changes in rainfall patterns, extreme weather, arrival of seasons, and more. Collectively, global warming and its effects are known as climate change. While there have been prehistoric periods of global warming, observed changes since the mid-20th century have been unprecedented in rate and scale. So a dataset on the temperature of major cities of the world will help analyze the same. Also weather information is helpful for a lot of data science tasks like sales forecasting, logistics etc. The data is available for research and non-commercial purposes only.

license :

Temperature Data Archive

This site contains files of daily average temperatures for 157U.S. and 167 international cities. The files are updated…

academic.udayton.edu

Content : Daily level average temperature values is present in city_temperature.csv file

Acknowledgements :

University of Dayton for making this dataset available in the first place!

The data contributor :

SRK | Kaggle

Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data…

www.kaggle.com

Data Preparing

1. Importing the required libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt

# !pip install plotly
# !pip install chart_studio

import plotly.tools as tls
import plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from chart_studio import plotly as py
from plotly.offline import iplot

%matplotlib inline

2. Loading the data into the data frame + Exploring The Data

Temperature Data Archive

This site contains files of daily average temperatures for 157U.S. and 167 international cities. The files are updated…

academic.udayton.edu

Brief description of weather data and sources

This archive contains files of average daily temperatures for 157 U.S. and 167 international cities. Source data for these files are from the Global Summary of the Day (GSOD) database archived by the National Climatic Data Center (NCDC). The average daily temperatures posted on this site are computed from 24 hourly temperature readings in the Global Summary of the Day (GSOD) data.

The data fields in each file posted on this site are: month, day, year, average daily temperature (F). We use “-99” as a no-data flag when data are not available.

! rm -f daily-temperature-of-major-cities.zip
! rm -f city_temperature.csv
! kaggle datasets download -d sudalairajkumar/daily-temperature-of-major-cities
! unzip daily-temperature-of-major-cities.zipWarning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/oscar/.kaggle/kaggle.json'
Downloading daily-temperature-of-major-cities.zip to /home/oscar/Documentos/PYTHON/world_temperature
100%|██████████████████████████████████████| 12.9M/12.9M [00:00<00:00, 15.5MB/s]
100%|██████████████████████████████████████| 12.9M/12.9M [00:00<00:00, 22.8MB/s]
Archive:  daily-temperature-of-major-cities.zip
  inflating: city_temperature.csv

2. Loading the data into the data frame + Exploring The Data

df = pd.read_csv("city_temperature.csv", low_memory=False)
df.head()

Convert Fahrenheit to Celsius

def fahr_to_celsius(temp_fahr):
    """Convert Fahrenheit to Celsius
    
    Return Celsius conversion of input"""
    temp_celsius = (temp_fahr - 32) * 5 / 9
    return temp_celsiusdf["AvgTemperature"] = round(fahr_to_celsius(df["AvgTemperature"]),2)df.head()

len(df.Country.unique())125# df.tail()#df.shape
#df.info()

3. Dropping the duplicate rows

df = df.drop_duplicates()
df.shape(2885612, 8)df.count()Region            2885612
Country           2885612
State             1436807
City              2885612
Month             2885612
Day               2885612
Year              2885612
AvgTemperature    2885612
dtype: int64

4. Dealing with the missing or null values

check missing values (Nan) in every column

for col in df.columns:
    print("The " + col + " contains Nan" + ":" + str((df[col].isna().any())))The Region contains Nan:False
The Country contains Nan:False
The State contains Nan:True
The City contains Nan:False
The Month contains Nan:False
The Day contains Nan:False
The Year contains Nan:False
The AvgTemperature contains Nan:False

check missing values (Zeros) in every column

for col in df.columns: # check missing values (Zeros) in every column
    print("The " + col + " contains 0" + ":" + str((df[col] == 0 ).any()))
df = df[df.Day != 0]
df.head()The Region contains 0:False
The Country contains 0:False
The State contains 0:False
The City contains 0:False
The Month contains 0:False
The Day contains 0:True
The Year contains 0:False
The AvgTemperature contains 0:True

df = df[(df.Year!=200) & (df.Year!=201)]
# df.head()

we don’t have missing values. Our data is ready

Exploratory Data Analysis : EDA

1. Average Temperture in every region

Average_Temperture_in_every_region = df.groupby("Region")["AvgTemperature"].mean().sort_values()[-1::-1]
Average_Temperture_in_every_region = Average_Temperture_in_every_region.rename({
    "South/Central America & Carribean":"South America",
    "Australia/South Pacific":"Australia"})
Average_Temperture_in_every_regionRegion
Middle East      20.213581
Asia             16.982514
South America    16.772604
Australia        16.211585
North America    12.926734
Africa           12.012884
Europe            8.273087
Name: AvgTemperature, dtype: float64plt.figure(figsize = (12,8))
plt.bar(Average_Temperture_in_every_region.index,Average_Temperture_in_every_region.values)
plt.xticks(rotation = 10,size = 15)
plt.yticks(size = 15)
plt.ylabel("Average_Temperture",size = 15)
plt.title("Average Temperture in every region",size = 20)
plt.show()

2. Growth of the average Temperture in every region over time

Change the index to date

datetime_series = pd.to_datetime(df[['Year','Month', 'Day']])
df['date'] = datetime_series
df = df.set_index('date')
df = df.drop(["Month","Day","Year"],axis = 1)
# df.head()region_year = ['Region', pd.Grouper(freq='Y')]
df_region = df.groupby(region_year).mean()
# df_region.head()plt.figure(figsize = (15,8))
for region in df["Region"].unique():
    plt.plot((df_region.loc[region]).index,df_region.loc[region]["AvgTemperature"],label = region) 
    
plt.legend()
plt.title("Growth of the average Temperture in every region over time",size = 20)
plt.xticks(size = 15)
plt.yticks(size = 15)
plt.show()

3. Growth of the average Temperture (Earth)

df_earth = df.groupby([pd.Grouper(freq = "Y")]).mean()
# df_earth.head()plt.figure(figsize = (12,8))
plt.plot(df_earth.index,df_earth.values,marker ="o")
plt.xticks(size =15)
plt.ylabel("average Temperture",size = 15)
plt.yticks(size =15)
plt.title("Growth of the average Temperture (Earth)",size =20)
plt.show()

3. The hotest Cities in The world

top_10_hotest_Cities_in_The_world = df.groupby("City").mean().sort_values(by = "AvgTemperature")[-1:-11:-1]
top_10_hotest_Cities_in_The_world

plt.figure(figsize = (12,8))
plt.barh(top_10_hotest_Cities_in_The_world.index,top_10_hotest_Cities_in_The_world.AvgTemperature)<BarContainer object of 10 artists>

4. The Growth of the Temperture in the hotest Cities in The world

city_year = ['City', pd.Grouper(freq='Y')]
df_city = df.groupby(city_year).mean()
# df_city.head()plt.figure(figsize = (12,8))
for city in top_10_hotest_Cities_in_The_world.index:
    plt.plot(df_city.loc[city].index,df_city.loc[city].AvgTemperature,label = city)
plt.legend()
plt.yticks(size = 15)
plt.xticks(size = 15)
plt.ylabel("Average Temperature",size = 15)
plt.title("The Growth of the Temperture in the hotest Cities in The world",size = 20)
plt.show()

5. The hotest Countries in The world

hotest_Countries_in_The_world = df.groupby("Country").mean().sort_values(by = "AvgTemperature")
# hotest_Countries_in_The_world.tail()plt.figure(figsize = (12,8))
plt.bar(hotest_Countries_in_The_world.index[-1:-33:-1],hotest_Countries_in_The_world.AvgTemperature[-1:-33:-1])
plt.yticks(size = 15)
plt.ylabel("Avgerage Temperature",size = 15)
plt.xticks(rotation = 90,size = 12)
plt.title("The hotest Countries in The world",size = 20)
plt.show()

7. The Average Temperature around the world¶

when using plotly we need codes of countries Data of Codes:

Countries ISO Codes

List of countries of the world with their ISO codes

www.kaggle.com

! rm -f wikipedia-iso-country-codes.csv 
! rm -f countries-iso-codes.zip
! kaggle datasets download -d juanumusic/countries-iso-codes
! unzip countries-iso-codes.zipWarning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/oscar/.kaggle/kaggle.json'
Downloading countries-iso-codes.zip to /home/oscar/Documentos/PYTHON/world_temperature
  0%|                                               | 0.00/4.16k [00:00<?, ?B/s]
100%|███████████████████████████████████████| 4.16k/4.16k [00:00<00:00, 879kB/s]
Archive:  countries-iso-codes.zip
  inflating: wikipedia-iso-country-codes.csvcode = pd.read_csv("wikipedia-iso-country-codes.csv") # this is for the county codes
code= code.set_index("English short name lower case")
# code.head()

I changed some countries name in the code data frame so they become the same as our main data frame index

This is important when merging the two data framescode = code.rename(index={"United States Of America": "US", "Côte d'Ivoire": "Ivory Coast", 
                          "Korea, Republic of (South Korea)": "South Korea", "Netherlands": "The Netherlands", 
                          "Syrian Arab Republic": "Syria", "Myanmar": "Myanmar (Burma)",
                          "Korea, Democratic People's Republic of": "North Korea", 
                          "Macedonia, the former Yugoslav Republic of": "Macedonia", 
                          "Ecuador": "Equador", "Tanzania, United Republic of": "Tanzania", 
                          "Serbia": "Serbia-Montenegro"})
# code.head()

Now we do the merging between the code data frame and our data

hott = pd.merge(hotest_Countries_in_The_world,code,left_index = True , right_index = True , how = "left")
hott.head()

data = [dict(type="choropleth", autocolorscale=False, locations=hott["Alpha-3 code"], z=hott["AvgTemperature"],
             text=hott.index, colorscale="reds", colorbar=dict(title="Temperture"))]layout = dict(title="The Average Temperature around the world", 
              geo=dict(scope="world",
                       projection=dict(type="equirectangular"), 
                       showlakes=True, lakecolor="rgb(66,165,245)",),)fig = dict(data = data,layout=layout)
#iplot(fig,filename = "d3-choropleth-map")

8. Variation of the mean Temperature Over The 12 months around the world

Variation_world = df.groupby(df.index.month).mean()
Variation_world = Variation_world.rename(index={1: "January", 2: "February", 3: "March", 4: "April", 5: "May",
                                                6: "June", 7: "July", 8: "August", 9: "September", 
                                                10: "October", 11: "November", 12: "December"})plt.figure(figsize=(12,8))
sns.barplot(x=Variation_world.index, y= 'AvgTemperature',data=Variation_world,palette='Set2')
plt.title('AVERAGE MEAN TEMPERATURE OF THE WORLD',size = 15)
plt.xticks(size = 10)
plt.yticks(size = 12)
plt.xlabel("Month",size = 12)
plt.ylabel("AVERAGE MEAN TEMPERATURE",size = 10)
plt.show()

9. Variation of the mean Temperature Over The 12 months in the hottest country in the world: United Arab Emirates

Variation_UAE = df.loc[df["Country"] == "United Arab Emirates"].groupby(
    df.loc[df["Country"] == "United Arab Emirates"].index.month).mean()
Variation_UAE = Variation_UAE.rename(index={1: "January", 2: "February", 3: "March", 4: "April", 5: "May",
                                            6: "June", 7: "July", 8: "August", 9: "September", 
                                            10: "October", 11: "November", 12: "December"})plt.figure(figsize=(12,8))
sns.barplot(x=Variation_UAE.index, y= 'AvgTemperature',data=Variation_UAE,palette='Set2')
plt.title('Variation of the mean Temperature Over The 12 months in the United Arab Emirates',size = 20)
plt.xticks(size = 10)
plt.yticks(size = 12)
plt.xlabel("Month",size = 12)
plt.ylabel("AVERAGE MEAN TEMPERATURE",size = 12)
plt.show()

10. Variation of mean Temperature over the months for each region

plt.figure(figsize=(12, 18))
i = 1  # this is for the subplot
for region in df.Region.unique():  # this for loop make it easy to visualize every region with less code

    region_data = df[df['Region'] == region]
    final_data = region_data.groupby(region_data.index.month).mean()[
        'AvgTemperature'].sort_values(ascending=False)

    final_data = pd.DataFrame(final_data)
    final_data = final_data.sort_index()

    final_data = final_data.rename(index={1: "January", 2: "February", 3: "March", 4: "April", 5: "May",
                                          6: "June", 7: "July", 8: "August", 9: "September",
                                          10: "October", 11: "November", 12: "December"})
    plt.subplot(4, 2, i)
    sns.barplot(x=final_data.index, y='AvgTemperature',
                data=final_data, palette='Paired')
    plt.title(region, size=10)
    plt.xlabel(None)
    plt.xticks(rotation=90, size=9)
    plt.ylabel("Mean Temperature", size=11)
    i += 1

The Average Temperature in Spain

Average_Temperature_Spain = df.loc[df["Country"] == "Spain"].groupby("City").mean()
Average_Temperature_Spain.head()

Average_Temperature_USA = df.loc[df["Country"] == "US"].groupby("State").mean().drop(["Additional Territories"],
                                                                                     axis = 0)
# Average_Temperature_USA.head()

we need to add the code to this data for visualization

USstates Dataset

analysis with pandas

www.kaggle.com

! rm -f state-areas.csv
! rm -f state-population.csv
! rm -f state-abbrevs.csv
! rm -f usstates-dataset.zip
! kaggle datasets download -d giodev11/usstates-dataset
! unzip usstates-dataset.zipWarning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/oscar/.kaggle/kaggle.json'
Downloading usstates-dataset.zip to /home/oscar/Documentos/PYTHON/world_temperature
  0%|                                               | 0.00/18.4k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 18.4k/18.4k [00:00<00:00, 2.38MB/s]
Archive:  usstates-dataset.zip
  inflating: state-abbrevs.csv       
  inflating: state-areas.csv         
  inflating: state-population.csvusa_codes = pd.read_csv('state-abbrevs.csv')
# Duplicate columns
usa_codes['State'] = usa_codes['state']
# Set new index
usa_codes =usa_codes.set_index("State")
# Rename columns
usa_codes.rename(columns={'abbreviation': 'Code', 'state': 'State'}, inplace=True)Average_Temperature_USA = pd.merge(Average_Temperature_USA,
                                   usa_codes,how = "left",right_index = True,left_index = True)
Average_Temperature_USA.head()

data_usa = [dict(type="choropleth", autocolorscale=False, locations=Average_Temperature_USA["Code"],
                 z=Average_Temperature_USA["AvgTemperature"],
                 locationmode="USA-states",
                 text=Average_Temperature_USA.index, colorscale="reds", colorbar=dict(title="Temperture"))]
layout_usa = dict(title="The Average Temperature in the USA states", 
                  geo=dict(scope="usa", projection=dict(type="albers usa"), 
                           showlakes=True, lakecolor="rgb(66,165,245)",),)fig_usa = dict(data = data_usa,layout=layout_usa)
#iplot(fig_usa,filename = "d3-choropleth-map")

12.Average Temperature in USA from 1995 to 2020

Temperature_USA_year = df.loc[df["Country"] == "US"].groupby(pd.Grouper(freq = "Y")).mean()
#Temperature_USA_year.head()plt.figure(figsize = (12,8))
sns.barplot(x = Temperature_USA_year.index.year,y = "AvgTemperature",data = Temperature_USA_year)
plt.yticks(size = 12)
plt.xticks(size = 12,rotation = 90)
plt.xlabel(None)
plt.ylabel("Avgerage Temperature",size = 12)
plt.title("Average Temperature in USA from 1995 to 2020",size = 15)
plt.show()

Conclusion

Displaying the data via plots can be an effective way to quickly present the data.

I hope it will help you to develop your training.

No matter what books or blogs or courses or videos one learns from, when it comes to implementation everything might look like “Out of Syllabus”

Best way to learn is by doing!

Best way to learn is by teaching what you have learned!

Never give up!

See you in Linkedin!

Oscar Rojo Martín - Student at Universidad de Deusto - San Sebastián, Basque Country, Spain | LinkedIn

View Oscar Rojo Martín's profile on LinkedIn, the world's largest professional community.

www.linkedin.com

References:

* https://www.kaggle.com/giodev11/usstates-dataset
* https://www.kaggle.com/khalilbrick/daily-temperature-of-major-cities/data
* https://www.kaggle.com/sudalairajkumar/daily-temperature-of-major-cities/data#

Working With Temperatures

Data Overview :

license :

Temperature Data Archive

This site contains files of daily average temperatures for 157U.S. and 167 international cities. The files are updated…

Acknowledgements :

SRK | Kaggle

Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data…

Data Preparing

1. Importing the required libraries

2. Loading the data into the data frame + Exploring The Data

Temperature Data Archive

This site contains files of daily average temperatures for 157U.S. and 167 international cities. The files are updated…

Brief description of weather data and sources

2. Loading the data into the data frame + Exploring The Data

Convert Fahrenheit to Celsius

3. Dropping the duplicate rows

4. Dealing with the missing or null values

Exploratory Data Analysis : EDA

1. Average Temperture in every region

2. Growth of the average Temperture in every region over time

3. Growth of the average Temperture (Earth)

3. The hotest Cities in The world

4. The Growth of the Temperture in the hotest Cities in The world

5. The hotest Countries in The world

7. The Average Temperature around the world¶

Countries ISO Codes

List of countries of the world with their ISO codes

8. Variation of the mean Temperature Over The 12 months around the world

9. Variation of the mean Temperature Over The 12 months in the hottest country in the world: United Arab Emirates

10. Variation of mean Temperature over the months for each region

USstates Dataset

analysis with pandas

12.Average Temperature in USA from 1995 to 2020

Conclusion

Never give up!

Oscar Rojo Martín - Student at Universidad de Deusto - San Sebastián, Basque Country, Spain | LinkedIn

View Oscar Rojo Martín's profile on LinkedIn, the world's largest professional community.

References:

Written by Oscar Rojo