Listing Live Weather Conditions of World’s Most Populous Cities

A Bite-Size Python Project using OpenWeatherMap API and BeautifulSoup

M. Makkawi
9 min readApr 13, 2020
Photo by Kyle Glenn on Unsplash

One of the big challenges facing aspiring data scientists and data analysts today is the acquisition of reliable data to use in their analyses and projects. Specifically, regularly updated reliable data is difficult to come by online as many datasets that are available are static or only get updated occasionally. In this post, we’ll work through a project that uses Public APIs and web scraping, two effective avenues to acquiring the most up-to-date data available online.

For this project, the overall objective is to tabulate the current weather conditions of the 200 most populous cities in the world. The way we’ll tackle it is two parts. First, web-scraping to list the world’s most populous cities and then second by sending requests to the OpenWeatherMap API to get the weather data for each of the cities in the list. The coding environment we’ll be using throughout is a Jupyter Notebook and we’ll be using Python.

Quick Intro to APIs and Web-scraping

APIs

Application Programming Interfaces, in simple terms, are mediums that connect a user with a web application (in this case OpenWeatherMap). They allow the user to send requests to the service and delivers the resulting response back to the user. Requests can come in a few forms that can be categorized at a high-level into read (GET) or write (POST, PUT, DELETE) requests. For this exercise, we’ll only be using GET requests, where the response from the API will contain the live weather information of the city we seek.

Public APIs are generally a very reliable way to acquire data as they are usually operated by organizations that oversee the data quality and ensure that it is current and correct. APIs are also consistent which allows us to know the format of the data we are receiving and how its contents are stored every single time we send a request and receive a response. This simplifies the process of using/parsing the data downstream to feed into our dataset or project.

The drawback to APIs is that they aren’t always free and the ones that are free usually have limits on how many requests can be sent in a certain time frame. For this project, we’ll be sending a relatively small number of requests but in case any of your applications will need to send more requests than allowed, APIs usually have subscription plans that allow exceeding the limit for a certain price. Make sure to go through the documentation of the Public APIs you plan to use to find a set-up that works best for you.

Web Scraping

On the other hand, web-scraping takes advantage of the text rendered on a webpage in html and extracts it according to a set of instructions that we write. When compared to APIs, it can be considered an unreliable way to acquire data. It can however be a very valuable tool if there isn’t a free API available. Odds are most websites that you’ll come across have their data rendered in a format that is extractable using scraping. One issue though, is that navigating compiled html is usually a troublesome affair. To make things easier, there are different tools that can help get the webpages data into a format that is searchable and make the process of scraping much simpler. More on this when we get to scraping below.

Live Weather Conditions of Most Populous Cities

Sending GET requests to the OpenWeatherMap API

An API that OpenWeatherMap have made public and free is the Current Weather API that allows you to get the current weather conditions for any city in the world. The only necessary parameters to send are the name of the city and your API key (make sure to get it here). You can find all the information about the current weather API including other optional parameters that you can send in their API docs.

api.openweathermap.org/data/2.5/weather?q=city&appid=key

To check out how this API sends back its response, try pasting the above link in your browser filling in city with whichever city you’d like (ex. paris) and key with your API key that you received by email. Pretty nifty, isn’t it? More on this in a second.

Working with Python, there are some libraries that make working with API and web requests a lot easier. My library of choice is the requests library, that makes sending HTTP requests and accessing the response a simple task.

To use the requests library to send an HTTP request to the OpenWeatherMap API, you need to specify a variable for the API URL (weather) and another for a dictionary containing the parameters (params).

import requestsweather = 'http://api.openweathermap.org/data/2.5/weather'
params = {'q':'London',
'APPID':'Your API key',
'units':'metric'}
response = requests.get(weather, params)

To make sure that our GET request was successful, we can check the status code of the response:

response.status_code
>>> 200

A status code of 200 indicates that it was successful and that information was received from OpenWeatherMap. The code signifying a failed request is 404, which you’ll get if you try to send an incorrect city name for example.

So, where is the weather report? To check the information in the response, we’ll get the JSON object of the result:

response.json()
>>>
{'base': 'stations',
'clouds': {'all': 40},
'cod': 200,
'coord': {'lat': 51.51, 'lon': -0.13},
'dt': 1585436814,
'id': 2643743,
'main': {'feels_like': -0.27,
'humidity': 70,
'pressure': 1034,
'temp': 4.64,
'temp_max': 6,
'temp_min': 3.33},
'name': 'London',
'sys': {'country': 'GB',
'id': 1414,
'sunrise': 1585374241,
'sunset': 1585420001,
'type': 1},
'timezone': 0,
'visibility': 10000,
'weather': [{'description': 'scattered clouds',
'icon': '03n',
'id': 802,
'main': 'Clouds'}],
'wind': {'deg': 50, 'speed': 4.1}}

Alright, so there’s quite a bit to work with here. OpenWeatherMap have been generous enough to provide us with quite a bit of data about the city that we’ve requested in the form of a dictionary. This information includes the city’s coordinate data, data related to the wind speed, the sunset/sunrise time, the pressure/humidity, and the expected high/low temperature of the day. We’ll extract some of the information that’s important to us from this dictionary for each city that we send a request for.

Web-scraping to get world’s most populous cities

For the next step, we want to get a list of the world’s 200 most populous cities. To do this we’re going to look at the World Population Review website and scrape the names and populations for the top 200 cities and store them in a list.

World’s Most Populous Cities Table at worldpopulationreview.com

Two notes here:

  1. If you open the link you’ll notice that theres an option to download the table as a csv or json file. We will not be using either of those options for this exercise in order to introduce the concept of web-scraping. However, when working on similar projects in the future, I’d advise to go with the simpler option of just downloading the data and extracting the columns and rows you need. Mostly for clarity and code-readability as web-scraping can get a bit messy, but also because some data does not need to be updated regularly. The list of the world’s 200 most populous cities will change very slowly in the next 20 or so years so up-to-date data is not crucial. Unfortunately, downloading data tables isn’t always an option so scraping might be necessary.
  2. As mentioned in the first note, web-scraping can be a messy affair. I don’t believe there’s a best practice when it comes to scraping web pages (if there is please let me know), but my mantra is whatever gets the job done in a reliable manner is good enough. I stress on reliable here because you might think you’ve got the data in a correct manner, but in actuality you haven’t accounted for some anomalies in row 114 or some other obscure location in the rendered html file, and this causes an unforeseen error downstream. Always double check your data. Finally, web pages tend to change, add, remove items every once in a while, so make sure that your code still works and grabs the correct data.

For web-scraping, there’s a few tools out there that can help, but I’ve chosen BeautifulSoup which is a Python package that can parse html and xml webpages and makes it easy to navigate and search through the content.

from bs4 import BeautifulSoupworld_cities = requests.get('https://worldpopulationreview.com/world-cities/')cities = BeautifulSoup(world_cities.text,'lxml')

To extract the city names, we’ll loop through all instances of an ‘a’ tag, which references a hyperlink. This is a small hack, as if you check the WorldPopulationReview webpage, you’ll notice that only the city names in the table are a link. This makes it easy to find them using a findAll and then looping through the result to extract the top 200:

city_names = []
for i in cities.findAll('a')[41:241]:
city_names.append(i.getText())
city_names[:5]
>>> ['Tokyo', 'Delhi', 'Shanghai', 'Sao Paulo', 'Mexico City']

For the populations (not as important but still cool to have), it’s a bit more difficult as they are just normal text in the table. Table entries can be identified with the ‘td’ tag. So we’ll loop 1000 times and extract every 5th entry, which will point to the population value:

#Please don't ever do this
city_populations = []
for i in range(1000):
if i%5 == 0:
city_populations.append(cities.findAll('td')[i+2].getText())
city_populations[:5]
>>> ['37,393,129', '30,290,936', '27,058,479', '22,043,028', '21,782,378']

Putting it all together

Alright, now we’ve got our list of 200 city names. Let’s go ahead and get the weather conditions for all of them.

We make a function called getCurrentWeather that simplifies the process of sending a request by taking in the ‘params’ dictionary as a parameter that will contain the city name. The only param that differs with each request is the city name since the other parameters, such as the API key and desired unit, will be common across all the requests.

params = {}def getCurrentWeather(params):
weather = 'http://api.openweathermap.org/data/2.5/weather'
params['APPID'] = os.environ['APPID']
params['units'] = 'metric'
return requests.get(weather, params)

Next, we loop 200 times extracting the city name from our list city_names, send a request to the API, and extract all the information we need from each response and store them in a dictionary called ‘city_data’. Each dictionary here contains all the information about one city. We then append each city’s information to the list called ‘world_temperatures’.

world_temperatures = []for i in range(200):    # Extracting the city name and assigning it to 'q'
params['q'] = city_names[i]
# Sending the request to OWM using our function
response = getCurrentWeather(params)

if response.status_code != 200:
continue

city_info = response.json()

# Extracting information from response
country = city_info['sys']['country']
latitude = city_info['coord']['lat']
longitude = city_info['coord']['lon']
weather_condition = city_info['weather'][0]['description']
temperature = city_info['main']['feels_like']
max_temp = city_information['main']['temp_max']
min_temp = city_information['main']['temp_min']

# For each city, compile data in a dictionary
city_data = {'City': city_names[i],
'Country': country,
'Latitude': latitude,
'Longitude': longitude,
'Population': city_populations[i],
'Weather': weather_condition,
'Temperature': temperature,
'Temperature Max (C)': max_temp,
'Temperature Min (C)': min_temp}

# Append each city's info to the list
world_temperatures.append(city_data)

Note the if status_code != 200 then continue. This means that if for whatever reason the API did not return weather information for a particular city (which could be for many reasons including incorrect spelling, or simply not having weather info about that city), then move on to the next city instead of trying to extract information from a non-existent response which could corrupt our dictionary.

Finally, we create a pandas dataframe using the ‘world_temperatures’ list and extract the column names from the dictionary keys.

import pandas as pdpd.DataFrame(data = world_temperatures, columns = city_data.keys())
DataFrame containing city weather information for top 25 most populous cities

And Voila! We’ve got a dataframe with a little less than 200 rows (OpenWeatherMap didn’t have weather data for some of the cities) containing the city name, it’s coordinates, population, weather conditions, and min and max temperatures for the world’s most populous cities.

For the next post, we’ll look at how to plot these on a map using the Leaflet package in R and making an R-Shiny application out of it. We’ll also try to create a scheduled task that calls the OpenWeatherMap API on a regular basis so we can get daily (potentially hourly) updates to the data that is fed into the application, and create a live world weather map.

The link to notebook can be found here.

--

--