Chronicles of the Shell Wizard: Making sense of API responses — Traded securities example.

10 min readMar 27, 2023

“Welcome to Chronicle of the Shell Wizard! As fellow students on the path to mastering the terminal, we’ll share tips and tricks to help you process and analyse your investment data with ease.”

The Shell wizard walking the white ASCII path amidst the dark forest of terminal.

Well SHELL-O there! (That’s how Shell Dwellers say hello to each other, well… not really I just made that up). So in my last post I spoke about APIs aka “Application Programming Interface” and we played with the NASDAQ Link API to download Yield and Total Return indexes for the BofA Merrill Lynch corporate bond indexes. But what exactly are we talking about here?

1. APIs made easy for Deliveroo Users, who isn’t a Deliveroo user?

Imagine you want to order food from a restaurant, but you don’t want to physically go to the restaurant. Instead, you use the restaurant’s phone number to call them and place your order over the phone (back in the old days at least).

In this scenario, the restaurant’s phone number is like an API. Just like how the phone number allows you to communicate with the restaurant without physically going there, an API allows different software applications to communicate and exchange information with each other over the internet.

With an API, different software applications can “talk” to each other and share data in a structured and standardized way. So if you want to interact programmatically with financial data providers, ECB, FED, IMF… or data vendors you will need to access their API: Which is why it is quite important to know how to use it. Some data vendors also abide by the OpenAPI standards which means that whatever API you query, communication protocols and response formats will be similar. (Learn it once, works for all)

2. Proof is in the pudding (that you just ordered)

So whenever you start playing with APIs you can generally obtain three different types of data format upon issuing the query:

CSV — Comma separated values; the one data format everyone knows and loves since they can be opened with our best mate Microsoft Excel.
XML — Extensible Mark Up format; Well this is one is… yucky yucky and difficult to parse (at least for me). I’ll show you an ->example<-.
JSON — JavaScript Object Notation; Despite the name, it’s got nothing to do with JavaScript and is defacto the standard response format for most APIs out there. JSON operates on the basis of “KEY | Value” pairs (we call that dictionaries).

Whenever you query an API for data you receive: “A RESPONSE OBJECT”

Now if you ended up reading my Hissing the Python way: Getting Started (and got yourself set up on your own machine or on Google colab) you will soon understand JSON as it is very similar to some Python data structures. JSON resembles “native” Python data structure called “dictionaries.” Other programming languages call that “an associative array”.

Lets look at a sample API response in JSON we can obtain from the Polygon API for FX data.

The process of mapping a JSON Object into a data structure of a programming language is called “serialisation”. For example, in the response object we can see that the value associated to the key “adjusted” is “true”. Lets try to pass that to Python.

Python doesn’t recognize “true” as a real data structure yet it does recognize “True” as one. When “converting” JSON to Python data structures one task will therefore be to convert ‘true’ to ‘True’: That’s the kind of transformations serialisation is all about.

The JSON key value pair format is fairly similar to traditional Python Data structures.

The symbol “{ }” denotes the start and end of a “dictionary”. What is a dictionary? An unordered and mutable collection (the content inside the data structure can be changed) of key-value pairs enclosed in curly braces.
In Python The symbol “[ ]” denotes the start and end of a list, JSON calls that an array. What are lists? Ordered and mutable sequence of elements enclosed in square brackets.

Typically a JSON response involves a blend of dictionaries with lists/array or “nested dictionaries” with additional lists nested inside. Let’s see an example directly in the IPython console using the above JSON response object from Polygon’s FX API:

Example: Accessing records in Python dictionariesmapped from JSON responses

Recorded by Franklin Schram

asciinema.org

Why do we need to do all that stuff? Well at some point we might want to do some calculations on the data right? But, you know what? I sometimes feel all that JSON stuff is too technical isn’t it, lets just drop all that JARGON and download a good old CSV file; just right there…

Come on, HIT ME! Get me CSV data! (Sorry I blanked my API key, I mean its a free one, you could get one too if you wanted some free stuff)

HOW GLAD AM I WE’RE GONNA GET ALL THAT DATA IN A NICE TABULAR FORMAT!!!

What??!! Now, I wonder why… JSON data is accessible for FREE and not csv data??? Hmmm… Could it be because… most people don’t understand how to process it? Well you are now equipped with knowledge that will set you FREE (quite literally). Me? I’m BASH guy so my favorite way to process JSON is in the terminal with just a few simple tools.

3. Building a simple API querry

So staying with the Polygon.io API we’ll take it step by step. Let’s say I want to download a list of securities identifiers but that I don’t know where to start.

What does /v3/reference/tickers mean? Well that’s the basic URL you’ll use to access this API “end point” to get the ticker data. In that case: “https://api.polygon.io/v3/reference/tickers”.

Querying the API is the just a matter of constructing a query in the URL (which is called URL encoding). So lets see the kind of things you can ask for.

Some of the parameters available for the Tickers end point of the Polygon.io API.

So let’s say I’m looking for an ETF listed in the US and that is currently traded (i.e. not retired). The URL would evolve as shown below and the data stream I would receive would include ALL ETFs meeting those criteria.

Lets make a simple script that returns “all active securities available”. In this one we do not specify anything beyond the key and store the result in a file called “tickers.json”. (Either copy paste and dump in the terminal or create a new file and save it as a shell script — don’t forget to have your Polygon API key in “polygon.key”)

#/bin/bash

#1. Passing my API to the a variable
API_KEY=$(< polygon.key)

#2. Passing URL to variable
POLYGON_URL="https://api.polygon.io/v3/reference/tickers?active=true&apiKey="

#3. Specifying an output file
OTP_FILE="tickers.json"

#4. Sending the querry with WGET
wget --content-disposition -O ${OTP_FILE} "${POLYGON_URL}${API_KEY}"

Lets observe the output of this command in the terminal

API dumping in action!

Recorded by FranklinSchram

asciinema.org

Oh ! My that doesn’t look easy to read and or to process at all! How are we going to start? Something worries me as well: What’s this?!

The API response is limited to 100–1000 records, we have to follow this link if we want the rest of the data!

4. Parsing JSON output with JQ

As a design principle I always try to stick with BASH, it is an underappreciated scripting language yet it is really capable and fits natively inside the Linux ecosystem. One thing you can’t to with BASH is parse (easily) JSON output. This is when we bring in JQ.

So… this is what we got from the API:

JQ is a JSON parser, a program that is used to analyse the structure of a piece of text (in that case a JSON response). A parser typically takes input in the form of a sequence of characters, and attempts to parse this input into a tree-like structure, known as a parse tree. So… remeber we called our JSON response “tickers.json”, let’s try the below:

# Fetching JQ from the Debian repo

sudo apt-get -y install jq

# Switching JQ in raw mode with -r
# We use the '.' syntax to get the whole "prettified response"
# We pipe it into less to stop the terminal from dumping it all in one go

jq -r '.' tickers.json | less

Hmmm! That looks like that JSON output we know! But how do we know what’s inside that long JSON response? What fields are there? What is the tree structure like?

# Obtain the primary keys from a JSON tree (Level I)
jq '. | keys[]' tickers.jon | sort | uniq -c

Wait where is the data?! Do you think it could be in… RESULTS?

# Checking the key value Pairs inside the results dictionary (Level II)
jq '.results[] | keys[]' tickers.jon | sort | uniq -c

So here we have the main keys we asked for. Note that some response objects don’t have values associated to every keys. In that API call we got 100 response object out of which only 40 had a primary exchange associated with them for instance.

5. Fooling arround: Getting all active securities from polygon.io

Lets switch the gear up.

I indicated that the polygon.io API enables us to get 1000 response objects in one go, so lets encode that in the query URL.
I indicated that after those initial 1000 response object the API includes a key value pair with the “next_url” link, so we’ll have extract that with something akin to:

jq -r '.next_url' tickers.json

Finally we loop until the response from the API does not include a “next_url” key.

This should give us the below sample code:

#/bin/bash

## 1. Assinging variables for URL encoding

# Passing API key to variable
API_KEY=$(< polygon.key)
#Passing URL to variable
POLYGON_URL="https://api.polygon.io/v3/reference/tickers?"
# Getting only active tickers
ACTIVE='true'
# 1000 response objects (Max value)
LIMIT=1000
# Dump the API response in the tickers.json file
OPT_FILE='tickers.json'

## 2. Getting wget to download the file 

# Wget gets the file and dumps it into tickers.json
wget --content-disposition -O $OPT_FILE "$POLYGON_URL&active=$ACTIVE&limit=$LIMIT&apiKey=$API_KEY"

# We load the file into memory and parse for the next_url with JQ
URL=$(jq -r '.next_url' $OPT_FILE)

## 3. We loop until the API doesn't send a next_url in its response object
# a. If the variable URL containing the link to next response object is not 
#    null we keep on running wget to querry the API.
# b. We store the wget output in memory in the RESPONSE variable
# c. We Parse the response variable 
# d. We append the response object to the tickers.json file
# e. The loop stops when next_url takes on the value null (returned by jq)

while [[ $URL != "null" ]]; 
    
    do 
        RESPONSE=$(wget --content-disposition -O - "$URL&active=$ACTIVE&limit=$LIMIT&apiKey=$API_KEY")
         URL=$(echo $RESPONSE | jq -r '.next_url' )
         echo $RESPONSE >> $OPT_FILE
         sleep 12

done

echo "Batch download finished"

Wondering why i had to include “sleep 12"? Well… I can only make 5 calls a minute with my free key.

5 API calls / Minute on a free plan — 1 call every 12 seconds, fair enough

6. Transforming the JSON output to CSV.

Although JSON is great for data transfer you will agree that for analytical purposes (interactively) it is not great. So how can we transform that massive JSON file we got into a csv one?

Let’s say that out of all the keys we have, I just want:

ticker
name
market
active
locale
type
composite figi (more on that in another post)
last_updated_utc

I run the below

jq -r '.results[] | [.ticker, .name, .market, .active, .locale, .type, .composite_figi, .last_updated_utc]' tickers.json

Interesting but not great: I want it in CSV so I write:

jq -r '.results[] | [.ticker, .name, .market, .active, .locale, .type, .composite_figi, .last_updated_utc] | @csv' tickers.json > json.csv

Note this doesn’t include the headers which you can add separately via different methods. This one has worked for me using jq, other unix program let you do this in an easier way.

jq -r '(["TICKER","NAME","MARKET","ACTIVE","LOCALE","TYPE","COMPOSITE_FIGI", "LAST_UPDATED"] | @csv),(.results[] | [.ticker, .name, .market, .active, .locale, .type, .composite_figi, .last_updated_utc] | @csv)' tickers.json.bak > test.csv

sudo apt-get -y install visidata
visidata test.csv

And here is the prized output in CSV in all its 40 000 glorious rows

All this could of course be fully automated in a bash script with arrays and enable you to go directly from the JSON output to a nicely formated CSV file. Which can either be uploaded in an SQL database complemented by script to:

maintain the data using the LAST_UPDATED values
fetch time series automatically and or fundamental data for equity tickers for instance.
Make API calls to OpenFigi to find derivative instruments on the security.

I enjoyed this little project, hope you will find it useful!