Two different ways to download data from websites using Python
If you are looking the way to download data from the web this post will be help you!
In this post, I would like to share the way to download data from the INEG website and the API from DATA México Using different Python library.
First, I will explain how we can download data from the DATA México API using Request Library
The follow step will be covered:
· Step 1 : Import necessary Library
· Step 2: obtain the API / URL
· Stel 3: Use Request for Download data
· Step 4: Generate and Save the Data Frame
Let’s go with the steps for download data!!
Step 1: Import Necessary Library
import os
import urllib.request
import requests
import pandas as pd
import numpy
import zipfile
import wget
import json
import datetime
import zipfile
Step2: Obtain the API / URL
We will be download the information from DATA México
In this exercise we will use the data for the net trade balance (2021) from the Sonora State.
Step 3: Use request for download Data
url = requests.get('https://datamexico.org/api/data?Date+Year=2021&Product+Level=2&State=26&cube=economy_foreign_trade_mun&drilldowns=Flow,Municipality&measures=Trade+Value&parents=true&locale=en') #Create a variable
print(url)
Step 4: Generate and Save the Data Frame
json = url.json() #We pass the url to Json
json.keys()
df = pd.DataFrame(json['data']) #We generate a Dataframe from Json
df
df.to_csv("FileName.csv")
# If you just use file name then it will save CSV file in working directory
This is the end of the first way to download dat using request.
Now, I will explain how download data from link with CSV file in the INEG website
The follow step will be covered:
· Step 1 : Import necessary Library (Same as the first exercise)
· Step 2: obtain the link from INEGI Website
· Step 3: Use wget for Download the file
Step2: Obtain the link from INEGI Website
Step 3: Use wget for Download the file
url = "https://www.inegi.org.mx/contenidos/programas/ccpv/2020/datosabiertos/iter/"
nombreArchivo= "iter_26_cpv2020_csv.zip"
subdir="./datosCENSO2020Sonora"
if not os.path.exists(nombreArchivo):
if not os.path.exists(subdir): #we will be validate that the folder don't Exist
url= url + nombreArchivo #we will put the complete URL
wget.download(url) #Download the file
with zipfile.ZipFile(nombreArchivo, 'r') as zip_ref:# unzip the foleder
zip_ref.extractall(subdir + "./") #we will extract all the information in the folder
os.remove(nombreArchivo) #we will delete the zip folder and only we will keep the file
The file will be download in the work directory
folder= "./datosCENSO2020Sonora./iter_26_cpv2020/conjunto_de_datos/conjunto_de_datos_iter_26CSV20.csv"
df_censo2020 = pd.read_csv(folder) #Read the CSV file
df_censo2020 #Look the file
This is the end of the second way to download data.
Thank you very much for reading this post, I hope it has helped you!