Stata-Python API for UN Comtrade bulk data

Steve Ka Lok Wong
The Stata Gallery
Published in
7 min readAug 5, 2022
Merchandise trade is the engine of modern growth, yet it is not an easy task to pack the information into organized numbers
Photo by Erik Odiin on Unsplash

UN Comtrade Database is one of the best sources when it comes to bilateral trade data by product code. As of early 2022, it covers more country-year observations than WTO and ITC. This blogpost aims to guide you through step-by-step to fetch large amount of data from Comtrade via Stata-Python function.

*I used to be a Python dummy (to a certain extent, still am), it was my buddy, Satyam Anand, at Graduate Institute (now at George Town) introduced the API magic to me. All credit goes to him.

Too long didn’t read: jump to the final do.file here

Step 0 — Preamble: Is your Stata ready for this?

  1. Check if Python is already integrated with your Stata, you can simply type “python search” in the Stata command box. If there is no Python installation found, follow this
. python search
----------------------------------------------------------------------------------------------------------------------
Python environments found:
/usr/local/bin/python3
----------------------------------------------------------------------------------------------------------------------

2. Check if you have the necessary python packages for the task. Follow this guide to install the missing package(s)

. python which json
<module 'json' from '/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py
> '>
. python which numpy
<module 'numpy' from '/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/n
> umpy/__init__.py'>
. python which pandas
<module 'pandas' from '/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/
> pandas/__init__.py'>
. python which requests
<module 'requests' from '/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-package
> s/requests/__init__.py'>

Step 1 — Identify your data of interest

As a working example, let’s say we want to get:

  • Data: country i export to China, for all countries in the world
  • Frequency: Annual
  • Time period: 2000 to 2021
  • Classification: All 2-digit SITC Rev. 2

Once, you put the according info on Comtrade — Get data you should see the below page with “View API call” in the bottom

! The requested data is therefore accessible via: http://comtrade.un.org/api/get?max=1000&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2

Note that, I changed max=502 to max=1000, I will explain below what does that mean and why.

Step 2 — Understand the API call

/api/get?max=502&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2 (What are we looking at ??)

See more in SITC Revision 2, UN numerical country code and API documentation

Some of you might already notice the pattern:
/api/get?[API code 1]&[API code 2]&….
while each block of [API code] is in a format of
[x] = [choice]

We will make use of this pattern for the Python code below.

! Now, recall our objective is to fetch all bilateral export to China from 2000 to 2021, that means the above API setting is applicable to all years once we introduce a simple loop on ps = [2000, 2001, 2002, …, 2021]

Step 3 — Check the API link

We have this link generated by the click and select interface from UN Comtrade
http://comtrade.un.org/api/get?max=1000&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2

We better make sure the API link is valid before put it in work

  1. Install JSON formatter on Chrome extension
  2. Copy and paste the API link
  3. You should see the below:

! Note that all our data of interest begins from

“dataset” : (red box)

Step 4 — Test version

Jump to code

  1. Open do-file, set up the destination folder
local filepath = "/Users/.../blogpost1_api_comtrade" // adjust to yours
*country i export to china
cd `filepath'/output/i_X_CHN

2. Initiate python in Stata do-file

python:[python code]end

3. [python code part] Add necessary packages

import json
import numpy as np
import pandas as pd
import requests

4. Copy and paste the valid API link we have AND fetch the JSON file

url       = f'http://comtrade.un.org/api/get?max=10000&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2'result    = requests.get(url).json()

5. Recall that we found our data of interest from “dataset” : in Step 3
*Indent is essential

if 'dataset' in result: 
df = pd.DataFrame(result['dataset'])
df = df.replace({None: np.nan})
df.columns= [i[:32] for i in df.columns]

6. Store the data in Stata format (.dta)

df.to_stata(f'i_X_China_2021.dta')end 
*exiting python environment

7. Check your work

use ./i_X_China_2021, clear
^All countries export to China in year 2021

Test Version:

local filepath = "/Users/.../blogpost1_api_comtrade" // adjust to yours
*country i export to china
cd `filepath'/output/i_X_CHN
python:
import json
import numpy as np
import pandas as pd
import requests
url = f'http://comtrade.un.org/api/get?max=10000&type=C&freq=A&px=S2&ps=2021&r=all&p=156&rg=2&cc=AG2'result = requests.get(url).json()
if 'dataset' in result:
df = pd.DataFrame(result['dataset'])
df = df.replace({None: np.nan})
df.columns= [i[:32] for i in df.columns]
df.to_stata(f'i_X_China_2021.dta')enduse ./i_X_China_2021, clear

Step 5 — Loop it through years

Now that we have a single-year do-file up and running, we can easily extend it with a loop from 2000 to 2021 and storing each year as one .dta file.

  1. Compartmentalize the API link
def Comtrade_Scraper   (ps: int,
type: str= 'C',
freq: str= 'A',
px : str= 'S2',
r : str= 'all',
p : int= 156,
rg : int= 2,
cc : str= 'AG2'):

Note that ps is the only compartment without an equal sign as it is our only varying part and it is set to be loops by the later command

2. Define the ps (refers to year) being the varying input
*again, indent is essential

    """
Wrapper for creating URLs to access the Comtrade API
ARGUMENTS
*********
Required
ps = year
"""

3. Assemble the API link

    base      = 'https://comtrade.un.org/api/get?max=10000'
url = f'{base}&type={type}&freq={freq}&px={px}&ps={ps}&r={r}&p={p}&rg={rg}&cc={cc}'

4. JSON to .dta file
a little twist here is that now we name the file as i_X_China_{ps}.dta where {ps} refers to the year [2000, 2021]

    result    = requests.get(url).json()
if 'dataset' in result:
df = pd.DataFrame(result['dataset'])
df = df.replace({None: np.nan})
df.columns= [i[:32] for i in df.columns]
df.to_stata(f'i_X_China_{ps}.dta')
return df

Note that I also added return df, it is just a small trick to return the dataframe right before converting it into .dta file. The practical use is that you can keep an eye on which year has ran and quickly vet if there is any issue from the no. of obs. (red box) and/or no. of columns (blue box).

5. Introduce the loop
Python iterates with increments by 1 and stops before the ending number. In other words, we have to put (2000, 2022) for our period of interest year 2000 to 2021.

for i in range(2000,2022): Comtrade_Scraper(i)

6. Assemble all

Final do-file

local filepath = "/Users/.../blogpost1_api_comtrade" // adjust to yours
*country i export to china
cd `filepath'/output/i_X_CHN
python:
import json
import numpy as np
import pandas as pd
import requests
def Comtrade_Scraper (ps: int,
type: str= 'C',
freq: str= 'A',
px : str= 'S2',
r : str= 'all',
p : int= 156,
rg : int= 2,
cc : str= 'AG2'):
"""
Wrapper for creating URLs to access the Comtrade API
ARGUMENTS
*********
Required
ps = year
"""
base = 'https://comtrade.un.org/api/get?max=10000'
url = f'{base}&type={type}&freq={freq}&px={px}&ps={ps}&r={r}&p={p}&rg={rg}&cc={cc}'
result = requests.get(url).json()
if 'dataset' in result:
df = pd.DataFrame(result['dataset'])
df = df.replace({None: np.nan})
df.columns= [i[:32] for i in df.columns]
df.to_stata(f'i_X_China_{ps}.dta')
return df
for i in range(2000,2022): Comtrade_Scraper(i)
end
*Explains below in 8) Final touch
di "`c(pwd)'" // Display path to current folder
local files : dir "`c(pwd)'" files "*.dta"
foreach x of local files {
di "`x'" // Display file name
append using `x'
}
save i_X_China_2000_2021.dta, replace

7. Result

8. Final touch
Appending all the by-year files into long format

di "`c(pwd)'" // Display path to current folder
local files : dir "`c(pwd)'" files "*.dta"
foreach x of local files {
di "`x'" // Display file name
append using `x'
}
save i_X_China_2000_2021.dta, replace

Variant version:

Some final notes:

An obvious shortcoming is that the download data limit restricting researchers access to the most granular data possible (6-digit product code). Some organizations have premium site license subscription, e.g. Geneva Graduate Institute. By connecting via their IP addresses, I could download up to 100,000 obs. per request, which is sufficient to cover all country i-to-country j by 4-digit product code (75,803 country-pair-product obs. for year 2019).

To go further from there, you will need to find an authentication token/ valid account at Comtrade from powerful friend(s), colleague(s) or supervisor(s).

Long page done and you have made it! Hope you all enjoy this guide! Any questions / suggestion, you can message me or find me via my website.

More resources:

Stata-Python integration blogpost by Chuck Huber

About the author

I am a PhD student in economics at Geneva Graduate Institute (IHEID). Stata has been my go-to software due to (too) many path dependencies. There are a lot of user-written guides that have made my research journey much easier. I hope to be one of the helpful guy (guide) you found online. You can also find me on Twitter, LinkedIn and my Website.

--

--