Two different ways to download data from websites using Python

If you are looking the way to download data from the web this post will be help you!

Aarongranich
MCD-UNISON
3 min readDec 14, 2022

--

In this post, I would like to share the way to download data from the INEG website and the API from DATA México Using different Python library.

First, I will explain how we can download data from the DATA México API using Request Library

The follow step will be covered:

· Step 1 : Import necessary Library

· Step 2: obtain the API / URL

· Stel 3: Use Request for Download data

· Step 4: Generate and Save the Data Frame

Let’s go with the steps for download data!!

Step 1: Import Necessary Library

import os
import urllib.request
import requests
import pandas as pd
import numpy
import zipfile
import wget
import json
import datetime
import zipfile

Step2: Obtain the API / URL

We will be download the information from DATA México

We will use the data from the Sonora Net Trade Balance (2021), you need click in the data frame icon in the right top screen
Imagen 1. Sonora, México Net Trade Balance (2021).

In this exercise we will use the data for the net trade balance (2021) from the Sonora State.

Click in copy for obtain the link
Imagen 2. Copy the Endpoint API.

Step 3: Use request for download Data

url = requests.get('https://datamexico.org/api/data?Date+Year=2021&Product+Level=2&State=26&cube=economy_foreign_trade_mun&drilldowns=Flow,Municipality&measures=Trade+Value&parents=true&locale=en') #Create a variable
print(url)
Imagen 4. The HTTP 200 OK success status response code indicates that the request has succeeded.

Step 4: Generate and Save the Data Frame

json = url.json() #We pass the url to Json
json.keys()
df = pd.DataFrame(json['data']) #We generate a Dataframe from Json
df
Imagen 4. Data frame Ready.
df.to_csv("FileName.csv")
# If you just use file name then it will save CSV file in working directory

This is the end of the first way to download dat using request.

Now, I will explain how download data from link with CSV file in the INEG website

The follow step will be covered:

· Step 1 : Import necessary Library (Same as the first exercise)

· Step 2: obtain the link from INEGI Website

· Step 3: Use wget for Download the file

Step2: Obtain the link from INEGI Website

we will copy the link in the CSV icon
Imagen 5. INEGI Open Data WebSite

Step 3: Use wget for Download the file

url = "https://www.inegi.org.mx/contenidos/programas/ccpv/2020/datosabiertos/iter/"
nombreArchivo= "iter_26_cpv2020_csv.zip"
subdir="./datosCENSO2020Sonora"

if not os.path.exists(nombreArchivo):
if not os.path.exists(subdir): #we will be validate that the folder don't Exist
url= url + nombreArchivo #we will put the complete URL
wget.download(url) #Download the file
with zipfile.ZipFile(nombreArchivo, 'r') as zip_ref:# unzip the foleder
zip_ref.extractall(subdir + "./") #we will extract all the information in the folder
os.remove(nombreArchivo) #we will delete the zip folder and only we will keep the file

The file will be download in the work directory

folder= "./datosCENSO2020Sonora./iter_26_cpv2020/conjunto_de_datos/conjunto_de_datos_iter_26CSV20.csv"
df_censo2020 = pd.read_csv(folder) #Read the CSV file
df_censo2020 #Look the file
Imagen 6. Data Frame

This is the end of the second way to download data.

Thank you very much for reading this post, I hope it has helped you!

--

--