Quant Trader’s Toolkit: Free and Fast Fundamental Data from SEC with Python

5 min readMar 7, 2023

Quant Trader’s Toolkit — Full Series Links:

An Overview
Free and Fast Fundamental Data from SEC (you are here)

Your Wait Is Over!

Have you ever struggled to get the fundamental data you need?! Have you ever wanted to use quarterly fundamental data but discovered it was hidden behind a paywall?! Well today, you are being gifted python code to access FREE FUNDAMENTAL DATA from sec filings, QUICKLY AND EASILY, courtesy of your friends here at The Quant Trading Room!

Data Is King

Data will be the bedrock of any strategy you’ll build as a quant. More specifically, obtaining high-quality data that other people do not have access to will ultimately be where you find your most profitable alphas. What do you think your chances are of finding an edge using only OHLC data obtained from yfinance? They’re low…. very low. No need to panic though! In this Quant Trader’s Toolkit series, we will be providing you with the tools that will allow you to separate yourself from the wannabes and discover the edges that will fund your sugar baby’s rent payments ;).

Get All Publicly Traded Companies

This first function enables you to fetch the Central Index Keys (CIK) of ALL of the publicly traded companies that are responsible for submitting reports to the Securities Exchange Commission.


def get_all_companies(headers):
    
    companyTickers = requests.get(
        "https://www.sec.gov/files/company_tickers.json",
        headers=headers)
    company_dct = companyTickers.json()
    cik_lst = [str(company_dct[x]['cik_str']).zfill(10) for x in company_dct]
    
    return cik_lst

Example Output of cik_lst[:5] → The first five CIKs from cik_lst

Get Filing Data By CIK

This function enables you to take one of the CIKs from the list returned by the last function and request the data from all of that companies 10-K (Annual Report) and 10-Q (Quarterly Report) filings dating back to 2009. A dictionary is returned which includes the name of the company, the CIK, and the data for 10-K and 10-Q reports in both separate and combined form for a more comprehensive picture of the fundamentals of the company.

def get_filing_data_by_cik(cik, headers):
    
    companyFacts = requests.get(
        f'https://data.sec.gov/api/xbrl/companyfacts/CIK{cik}.json', 
        headers=headers)
    cFacts = companyFacts.json()
    facts = cFacts['facts']
    
    date_dict = {'Q1': '03-30-', 
                 'Q2': '06-30-', 
                 'Q3': '09-30-', 
                 'FY': '12-31-'}

    Q10K10_tags_values = [(f"{t}_{u}", 
                           [(f"{date_dict[i['fp']]}{i['fy']}", i['val']) 
                            for i in facts[c][t]['units'][u] if i['form'] 
                            in ['10-Q','10-K']
                           ]
                          ) 
                          for c in facts.keys() 
                          for t in facts[c].keys() 
                          for u in facts[c][t]['units'].keys()
                         ]

    K10_tags_values = [(f"{t}_{u}", 
                        [(f"{date_dict[i['fp']]}{i['fy']}", i['val']) 
                         for i in facts[c][t]['units'][u] if i['form'] 
                         in ['10-K']
                        ]
                       ) 
                       for c in facts.keys() 
                       for t in facts[c].keys() 
                       for u in facts[c][t]['units'].keys()
                      ]

    Q10_tags_values = [(f"{t}_{u}", 
                        [(f"{date_dict[i['fp']]}{i['fy']}", i['val']) 
                         for i in facts[c][t]['units'][u] if i['form'] 
                         in ['10-Q']
                        ]
                       ) 
                       for c in facts.keys() 
                       for t in facts[c].keys() 
                       for u in facts[c][t]['units'].keys()
                      ]

    filing_data = {'name': cFacts['entityName'], 
                   'cik': cFacts['cik'], 
                   '10Q10K': {k:v for k,v in Q10K10_tags_values},
                   '10K': {k:v for k,v in K10_tags_values}, 
                   '10Q': {k:v for k,v in Q10_tags_values}
                  }
    
    return filing_data

Viewing The Data

Now that you have a dictionary containing all the fundamental data from 10-K and 10-Q reports, it’s time to turn this data into a form we can look at without hurting ourselves. Just fill the parameters of this function with the filing data you just collected for the specific company you are interested in, tell it from which form you wish to view the data, and include your tolerance for nan values in the data. For instance, when pulling each tag (ex. EntityCommonStockSharesOutstanding_shares) from the filing, some tags are filled with nan values and only a few non-nan values. If you are ok with 50% nan values in the original tag, then put nan_tolerance=50. If you’re a little confused right now, just put nan_tolerance=20 and call it good. Only the tags with data that is less than 20% nans will be included in your final dataframe.

def convert_dict_to_df(filing_data, form, nan_tolerance=100):
    '''
    filing_data: dict
    form: lst 
          ['10K'], ['10Q'], or ['10K', '10Q']
    nan_tolerance: int 
                    a integer from 1 - 100
    '''
    form_df = pd.DataFrame()
    
    if form == ['10K']:
        
        tenk = filing_data['10K']

        for i in tqdm(tenk.keys()):
            
            if len(tenk[i]) > 0:
                
                df = pd.DataFrame(tenk[i], columns=['date', i])
                df.index = pd.to_datetime(df.date)
                df = df.drop(columns=['date'])
                df = df.resample('Y').last()
                length = len(df)
                percent_nan = df.isnull().sum() * 100 / length
                
                if percent_nan[0] < nan_tolerance:
                    
                    if form_df.empty:
                        form_df = df
                    else:
                        form_df = pd.merge(form_df, 
                                           df, 
                                           on=['date'], 
                                           how='outer')
        
    elif form == ['10Q']:
        
        tenq = filing_data['10Q']

        for i in tqdm(tenq.keys()):
            
            if len(tenq[i]) > 0:
                
                df = pd.DataFrame(tenq[i], columns=['date', i])
                df.index = pd.to_datetime(df.date)
                df = df.drop(columns=['date'])
                df = df.resample('Q').last()
                length = len(df)
                percent_nan = df.isnull().sum() * 100 / length
                
                if percent_nan[0] < nan_tolerance:
                    
                    if form_df.empty:
                        form_df = df
                    else:
                        form_df = pd.merge(form_df, 
                                           df, 
                                           on=['date'], 
                                           how='outer')
    
    elif form == ['10Q', '10K'] or form == ['10K', '10Q']:
        
        tenqk = filing_data['10Q10K']
        
        for i in tqdm(tenqk.keys()):
        
            if len(tenqk[i]) > 0:
                df = pd.DataFrame(tenqk[i], columns=['date', i])
                df.index = pd.to_datetime(df.date)
                df = df.drop(columns=['date'])
                df = df.resample('Q').last()
                length = len(df)
                percent_nan = df.isnull().sum() * 100 / length

                if length > 30 and percent_nan[0] < nan_tolerance :

                    if form_df.empty:
                        form_df = df
                    else:
                        form_df = pd.merge(form_df, 
                                           df, 
                                           on=['date'], 
                                           how='outer')
    
    else: 
        print("Invalid form.")
        print("Vaild inputs are ['10K'], ['10Q'], or ['10K', '10Q']")
        
    form_df.dropna(axis=1, how='all', inplace=True)
    form_df.dropna(axis=0, how='all', inplace=True)
    form_df = form_df.fillna(0)
    
    return form_df

Putting It All Together

Time to put the above functions in to action! First, the SEC requires you to set your email as the User Agent when making a get request to their api. Then, we’ll get all the CIKs, pick one to use, pull the filing data of that company, and view the data in a pandas data frame.

from tqdm import tqdm
import requests
import pandas as pd
# View Dataframe as an interactive table in Jupyter Notebook
from itables import init_notebook_mode
init_notebook_mode(all_interactive=True)

headers = {'User-Agent': "YOUR@EMAIL.HERE"}

cik_lst = get_all_companies(headers)

filing_data = get_filing_data_by_cik(cik_lst[1], headers)

df = convert_dict_to_df(filing_data, form=['10Q', '10K'], nan_tolerance=20)

df

That’s all folks! I’ll leave you with some wise words from my father.

“Always remember the first two rules of working with data. 1) Have fun! and 2) Have fun!”

Please feel free to comment with any questions, concerns, or topics you would like covered in future posts!

Check out our YouTube where we provide in depth breakdowns of the code in our articles!

If you found this or any of our other content valuable, please consider buying us a coffee.

Get full access to all of our articles by becoming a member of Medium!

**THIS CONTENT IS NOT FINANCIAL ADVICE. IT IS STRICTLY EDUCATIONAL**