The Penniless Investor : Lesson 2 - Getting free data, a BofA Merrill Lynch Corporate Bond Indexes examples.

Franklin Schram
9 min readMar 24, 2023

“Welcome to ‘The Penniless Investor’ — a series aimed at helping you make smarter investment choices when money is tight. Through tech-savvy tricks, free data sources, and sample investment strategies, we’ll equip you with actionable tips to make the most of your limited funds and get you started on your investing journey.” — This article is part of a series -> Start Here! <-

Is it a goblin or an internet troll? In any case, never take any advice from it.

Have you ever heard of the NASDAQ Data Link? No ? Per their own words, “Nasdaq Data Link is a powerful, centralized, cloud-based technology platform providing access to more than 250 trusted data sets, available via API.” They also add to their mission statement: “Search, discover and build.” How about we take their word for it? First you need an account, can I trust you on getting one? When you do, they’ll give you an API KEY (keep it in a text document), then you’ll end up here.

Splash screen of the NASDAQ Data Link — Hum… most things you have to pay for so I only click on “Free”

I have a confession to make — I… don’t enjoy coding in Python very much. It’s because of the dots, FAAAAR too many dots; like do_this.with_this_tool(on_this_thing) — Yes, I know, that’s called object oriented programming, you define an object and you chisel it to your liking with “methods”. Me? I am a brutish (and British) guy: I like to BASH things arround. So how about we do that instead of hissing like a Python? Amidst the free dataset one caught my eye: The BofA Merrill Lynch corporate range of indexes — you get 27 of them for free (okay… its yield and total return indexes, not spread… but that’s still a start). So first things first: When we deal with data WHAT DO WE DO? Huh? We check the meta-data. What is meta-data you ask? Hum… Data… that describes the data you’re interested in. Good meta-data yields robust processes since:

  • Good meta-data gives you a precise description of the attributes of the data your researching i.e. you know what to expect (int, float, string, date etc…)
  • Good meta-data gives you the tools to implement strong data validation pipelines prior to ingestion in databases i.e. reduces the risk of having errors in your models.
  • Good meta-data enables you to save time and effort by fetching series at their right release time (hum… provided the organisation is credible and does publish its data on time).

Good meta-data SHOULD BE an important criteria in the selection of your data provider, yet… here we’re penniless guys n’ gals so we just deal with what is FREE.

1. Getting meta-data from NASDAQ Data Link

So lets get a look at that NASDAQ Data Link meta-data:

#/bin/bash
#Passing URL to variable
ML_METADATA_URL="https://data.nasdaq.com/api/v3/databases/ML/metadata"

#Getting wget to download the file
wget --content-disposition -O ML_META $ML_METADATA_URL

#Detecting the type of stuff we recieved...
echo $(file ML_META)

How about we pop that in a script? Yes? If you followed my Lesson 1, you should be the proud owner of a Linux container, so boldly open up your terminal launch your linux container and run “code”.

The TERMINAL where everything… BEGINS
Click the little + and open up your container — I’m using Debian in this case.
Once you type in “code”, if installed, Microsoft Visual studio code should open. Click on the files icon on the upper left hand corner and select open folder. This will open the folder to your container.

It should open Microsoft Visual Studio Code and ask you whether you want to open the remote folder and trust the remote machine (in that case your local Linux container”). Now we all like to have a home to rest and unwind and the same applies in Linux distributions: You have a home folder; select it!

Click “okay” and open that home folder

Create a file with the little create file icon, we’ll call that file bofa_ml_metadata.sh — Dump the code above in there, save the file and head back to your terminal.

So what do we observe from the last line printed to the terminal? The NASDAQ Data Link returns a zip file (the one called ML_META) which promptly unzip with an easy:

unzip ML_META
The file command tells us the nature of a file — so… now we now that we need to unzip it

What do we get this time arround… A CSV file called ML_metadata.csv! Now, how in luck are you that I just have the right tool to help you with that — DON’T EVEN SAY MICROSOFT EXCEL — let me show you visidata. (Check my upcoming Shell Wizard series if you want to now more about this better Excel alternative). Type these lines one by one in your terminal.

#1. Install visidata with the below command
#I'm assuming you're running a debian flavored distro

sudo apt-get install -y visidata

#2. Launch visidata with the below command
# (press q to quit the app)
# CTRL H if you need help while in the app
visidata ML_metadata.csv
Meet visidata — your new best friend to view, interact and manipulate tabular format data.

This isn’t “great” meta-data.(We’ll talk about great meta data when we’ll set up our fundamental analysis tool kit). Here we get basic field descriptors (index code, index name, description, refreshed at, from and to dates). Now that we have more information on what we can get lets download some time-series.

2. Getting historical and daily data via API calls: one ticker example

The snippets of code below will enable you download either a full historical data series for one ticker or just a daily time point for one ticker. Let’s dive in!

  • First: We build a query to get historical data. For that, we need the ticker (as defined by NASDAQ Link), and the “from date”.
# Step one, querry to the API to gather historical data points
# Example with the Emerging Market High Yield Corporate Bond Index

#1. Building our variables to pass the API's URL
BASE_URL='https://data.nasdaq.com/api/v3/datasets'
DATASET_NAME='ML'
TICKER='EMHYY'
FORMAT='json'
API_KEY=$(< nasdaqlink.key)
START_DATE='1998-12-31'
END_DATE="$(date +%Y-%m-%d)"

# Constructing the URL based on the defined variables
curl "$BASE_URL/$DATASET_NAME/$TICKER.$FORMAT?start_date=$START_DATE&end_date=$END_DATE&api_key=$API_KEY"

How about we pop that in a script? Just repeat the steps above:

  • Create an empty file with a .sh extension
  • Dump the code in.
  • Now go to your terminal and make it executable with chmod +x <scriptname>
  • Run the script with ./<scriptname>

Run it and observe the JSON output being pumped out to the screen (we’ll talk about JSON in another article).

In an attempt to get rid of our MS Excel mindset, we’ll see how to deal with JSON in the next article and why JSON is perfect for time series especially as we ingest them into a database for later visualisation.

  • Second: We prepare the same query to get most recent daily data (we’ll use that to update our historical data on a daily basis AUTOMATICALLY, more on that further down in the “Penniless Investor” series). For that we need the ticker, “today’s date” and yesterday’s date.
# Step two, querry the NASDAQ Link API for a daily data point
# Example with the Emerging Market High Yield Corporate Bond Index

#1. Building our variables to pass the API's URL
BASE_URL='https://data.nasdaq.com/api/v3/datasets'
DATASET_NAME='ML'
TICKER='EMHYY'
FORMAT='json'
API_KEY=$(< nasdaqlink.key)
START_DATE="$(date --date="yesterday" +%Y-%m-%d)" # Yesterday
END_DATE="$(date +%Y-%m-%d)" # Today

# Constructing the URL based on the defined variables
curl "$BASE_URL/$DATASET_NAME/$TICKER.$FORMAT?start_date=$START_DATE&end_date=$END_DATE&api_key=$API_KEY"

One word of caution — remember the “Refreshed at” field in the Meta-data file? Make sure your run your script AFTER the data has been refreshed by the provider. Usually you can schedule that fairly easily in CRON (a simple task scheduler for Linux operating systems) where you can specify a frequency and time to run batches of scripts. In the case you have money well… you could get a duplex WebSocket between you and the provider and get real time feeds — another story… for another day.

3. Getting historical and daily data via API calls: Multiple tickers example

Okay so getting ONE data series is a bit useless allright? How about we download the whole Merrill Lynch dataset?

  • First: We parse through the Meta-Data object we received and extract ALL the tickers of ALL the indexes available.
  • Second: We parse through the Meta-Data object we received and extract ALL of the start dates for each data series.

How? WITH SOME SHELL MAGIC…

#!/bin/bash
readarray -t ML_TICKERS < <(awk 'BEGIN{FS = ","; OFS=" "} NR>1 {print $1}' ML_metadata.csv)
readarray -t ML_FROM_DATE < <(awk 'BEGIN{FS = ","; OFS=" "} NR>1 {print $5}' ML_metadata.csv)

That’s what I call arcane Shell scripting, pure wizardry of sort which combines GNU tools and a Bash built in functions. You might wonder what the syntax <(command) is. It is called process substitution, which allows you to use the output of a command as if it were a “file” feeding into a program. In this case, the output of the awk command is treated as if it were a file, and its contents are passed as input to the readarray command. Want to learn how to do Magic? Stay tuned.

  • Third: We include all of this in a “C-Style for loop

As such we will amend the code slightly. What will need to change?

  • Dynamic access to the tickers; so we’ll “loop” through the ML_TICKERS array to extract them.
  • Dynamic access to the start dates (potentially different for each index); we’ll loop through these as well.
  • For each iteration we’ll update the URL and download the data-series accordingly with our curl request.
#!/bin/bash
# Use: Batch download of historical data
# Provider: NASADQ Link API
# Dataset: BofA Merrill Lynch
# Nature: Corporate Bond TR and Yield Indexes

# 1. Building variables to pass the API's URL

BASE_URL='https://data.nasdaq.com/api/v3/datasets'
DATASET_NAME='ML'
FORMAT='json'
API_KEY=$(< nasdaqlink.key)
END_DATE="$(date +%Y-%m-%d)" # Today

# 2. Parsing Meta Data to extract tickers and start dates of series

readarray -t ML_TICKERS < <(awk 'BEGIN{FS = ","; OFS=" "} NR>1 {print $1}' ML_metadata.csv)
readarray -t ML_FROM_DATE < <(awk 'BEGIN{FS = ","; OFS=" "} NR>1 {print $5}' ML_metadata.csv)


# 3. Constructing the URL based on the defined variables

for (( i = 0; i < ${#ML_TICKERS[@]}; i++ ));
do
curl "$BASE_URL/$DATASET_NAME/${ML_TICKERS[$i]}.$FORMAT?start_date=${ML_FROM_DATE[$i]}&end_date=$END_DATE&api_key=$API_KEY" > "$i_${ML_TICKERS[$i]}.$FORMAT"
done

So you might wonder… What’s so special about this code? Why use this instead of Python? Well the short answer is that the code above, runs natively on any Linux distribution… out of the box.

  • You don’t need to install Python, all the tools are default to any distribution, just boot up a container and run all this from the get go.
  • You don’t need to install the “NASDAQ Link” or the “Requests” Python libraries via pip or mess with virtual environments to worry about backward compatibility.
  • You don’t need to get your head arround custom libraries methods; you just need to learn the basic Unix “tools of the trade” which are transferable to pretty much any task, have been there for ever, are stable and maintained… with pure love.
  • You can adjust the script to another Data Provider (Polygon, Alpha Vantage…) since API URLs are usually fairly similar and the base skeleton is there.

Now, having all our data in flat files is not really optimal you will agree. In the next article we will design a database scheme and create a database container to host all of our newly acquired JSON data. The goal of this exercise will be to:

  • Construct a database which will hold prices of instruments across different asset classes.
  • Complement the database with market indicators (indexes and or macro data)
  • Blend the two together to construct a fundamental analysis and back testing investment workbench which will greatly help forming views and identify market anomalies.
  • Ensure the database is updated automatically with the latest relevant data feeds.
  • Plug in a “front-end” for data visualisation.

All of this… For free! Because at the end of the day, we’re only penniless guys.

--

--