Obtaining data through API in Python

Agustín Haro León
6 min readApr 11, 2023

--

In the information age, more and more companies are turning to data collection and analysis to make informed business decisions. However, before you can start analyzing data, you must first acquire it. Fortunately, the internet is full of data providers, and there are two main ways to obtain them: through web scraping or using APIs.

Web scraping involves extracting data from a website by analyzing its HTML structure. While it can be effective, it can also be illegal and ethically questionable in some cases.

On the other hand, APIs (Application Programming Interfaces) are a set of software tools that allow developers to create applications that interact with other applications. An API allows you to send a data request to an application and receive a response in the format you want. Some APIs already have Python wrappers, which means you don’t need to write code to send a request. If there is no wrapper available, you will need to write the request yourself.

Making an API Call

To make an API call, the requests library is the de facto standard for making HTTP requests in Python. It abstracts the complexities of making requests behind a simple and attractive API so you can focus on interacting with services and consuming data in your application. When we make a request, we get back a requests.Response object. The requests.Response object contains the server’s response to the HTTP request. The response itself consists of several parts such as content, text, or status code. The status code is a number that tells us if we received the information we wanted or not. You will most likely receive one of these status codes:

200: Success!; 401: Unauthorized client error state: lack of valid authentication credentials; 403: The server understood the request but refuses to authorize it.

For example, to make a call to the New York Times website, you should use the following code in your Python notebook:

An interesting API that we can use is the OpenWeatherMap API (https://openweathermap.org/api), which provides weather information from different countries and cities at different times for free.

To make the API call:

  1. Select the city or cities from which you want weather information (for example, Berlin) and provide the API key that you have been provided with (you can see it at https://home.openweathermap.org/api_keys):

2. Combine the different parts from above to create one url:

3. Check the response to know if the access to the data is good:

4. Check the text to know if we have the data we need:

Using JSON

Before starting to store data, it’s important to determine the logical structure of the database. What tables will you need? How will these tables relate to each other? Only after answering these questions (and more), can you create an effective database. One of the most popular ways of structuring data is by using JavaScript Object Notation (JSON). JSON is a standard text format based on the syntax of JavaScript objects. It’s commonly used for transmitting data in web applications, such as sending data from the server to the client or vice versa. Since its creation, JSON has quickly become the de facto standard for information exchange. JSON supports primitive types, such as strings and numbers, as well as lists and nested objects. It looks like nested Python dictionaries:

  1. You can use .json() to verify that the data obtained is provided in a more organized way than when using .text(). Take a look at the OpenWeather API example:

2. Transform JSON to create a new variable and work with it, and verify again that the information is correct:

3. Check the keys of the data and the values inside the keys that we are interested:

4. Select the variables you want to consult. Through the information we’ve obtained as a response to querying the key we’re interested in, we select the variable we’re interested in according to its location within the data.

Using JSON to create a Pandas DataFrame

Option 1: Using pd.DataFrame()

The simplest way to transform a JSON file into a Pandas DataFrame is by using the pd.DataFrame() function. This method creates a DataFrame directly from the JSON file and returns the resulting DataFrame. Here is an example code that uses this method:

In this code, the pd.DataFrame() function is used to create a DataFrame from the JSON file weather_c_json. The resulting DataFrame will contain two columns, list and city, and each row in the DataFrame represents a value in the JSON file.

Option 2: Using pd.json_normalize()

Another way to transform a JSON file into a Pandas DataFrame is by using the pd.json_normalize() function. This method can handle nested JSON structures and create a flattened DataFrame. Here is an example code that uses this method:

In this code, the pd.json_normalize() function is used to create a flattened DataFrame from the JSON file weather_c_json. The resulting DataFrame will contain three columns, list.main.temp, city.name, and _id, and each row in the DataFrame represents a value in the JSON file.

Option 3: Creating DataFrames using for loops

A more complex way to create a DataFrame from a JSON file is by using for loops to extract specific values and store them in a dictionary. Then, the dictionary is converted into a Pandas DataFrame. Here is an example code that uses this method:

And you can try make loops for more cities at the same time:

The before code first defines an API key and creates an empty dictionary called weather_dict. Then, a for loop is used to iterate through each city in the list of cities. Within this for loop, the API is used to retrieve weather information for the current city, which is stored in a JSON object called weather_c_json. Then, another for loop is used to iterate through each item in the list of weather forecasts for the current city. Within this for loop, specific values are extracted from the JSON object and added to the corresponding lists in the weather_dict dictionary. Finally, the dictionary is converted to a Pandas DataFrame and returned as the result of the function.

In conclusion

Once you have the data, it’s important to store it in a database so that it’s easily accessible to the entire company. Although Python objects, such as dictionaries or Pandas DataFrames, are useful for storing data, they are not the most efficient format for storing large amounts of data. Instead, the use of relational databases is recommended.

In conclusion, data collection is a crucial step in business data analysis. As we have seen in this article, through the use of APIs and the Python requests library, you can efficiently collect data. Using JSON can be an effective way to structure the data before storing it.

--

--