Extracting Financial Statements from SEC Filings - XBRL-To-JSON

Jay
6 min readJul 21, 2021

--

Let’s explore how to extract and generate financial statements from 10-Q and 10-K SEC EDGAR filings using Python, pandas dataframes and SEC API.

Our Goal

  • Extract and standardize financial statements from 10-K and 10-Q filings — including income statements, balance sheets and cash flow statements
  • Generate a single pandas dataframe holding all multi-year quarterly income statements
Apple’s quarterly revenue and net income 2017–2021 extracted from 10-K and 10-Q filings

We use SEC API (https://sec-api.io) to

  • List all historical 10-Q and 10-K filings of Apple
  • Extract XBRL versions of all filings
  • Convert and standardize XBRL to JSON to create a single income statement dataframe

You can find and run the entire source code as Jupyter notebook here: https://gist.github.com/janlukasschroeder/3da274150fd00c1c1776c7e541a9b61b

Click on “Open in Colab” to run the code in Google Colab — no local execution required.

We start by downloading the JSON version of Apple’s most recent 10-Q filing using the XBRL-to-JSON converter API.

Apple’s Most Recent 10-Q Filing

The 10-Q URL is: https://www.sec.gov/Archives/edgar/data/320193/000032019321000056/aapl-20210327.htm

The API accepts any URL or accession number of a filing with an XBRL version of the filing, namely all 10-Qs and 10-Ks.

After providing the 10-Q URL, the API returns a fully standardized JSON version of the filing’s XBRL version including all financial statements, the cover page and many more. For now, we’re only interested in the financial statements.

The example output of the income statement looks like this. The statement includes US GAAP items such as:

  • Revenues
  • Cost Of Goods And ServicesSold
  • Gross Profit
  • Research And Development Expense
  • Selling General And Administrative Expense
  • Operating Expenses
  • Net Income
  • Earnings Per Share

Income Statement

Let’s convert the XBRL-JSON income statement into a pandas dataframe.

The resulting income statement dataframe looks like this:

Pandas dataframe of XBRL-to-JSON converted income statement from Apple’s 10-Q filing

Each column header represents a date range and each row represents a US GAAP item and its value for a given period. We see four date ranges:

  • 2020–12–27 — 2021–03–27 (most recent quarterly results)
  • 2019–12–29 — 2020–03–28 (quarterly results one year ago)
  • 2020–09–27 — 2021–03–27 (most recent half-yearly results)
  • 2019–09–29 — 2020–03–28 (half-yearly results one year ago)

Cleaning and merging income statements follow later. Now, let’s briefly generate Apple’s balance sheet and cash flow statement.

Balance Sheet

And the resulting balance sheet looks like this:

Pandas dataframe of XBRL-to-JSON converted balance sheet from Apple’s 10-Q filing

Cash Flow Statement

Let’s generate the cash flow statement:

And the resulting pandas dataframe looks like this:

Cash flow statement generated from Apple’s 10-Q filing

Multi-Year Financial Statements

Now we build a multi-year income statement with quarterly results for Apple.

Our steps:

  • Fetch all 10-Qs and 10-Ks published since 2016.
  • Merge all financial statements into a single dataframe
  • Clean statements and generate quarterly results for entire date range
  • Plot Revenue, Net Income and Earnings Per Share

First, we use the query API provided by SEC API to list all most recent 10-Q and 10-K filings filed by Apple.

Then, we extract all accession numbers of each filing.

Now we define a helper function to get the XBRL-JSON version of a filing by providing its accession number. We add a simple backoff strategy to our helper to avoid overloading the SEC API with too many requests per second. In case we send too many requests and the response fails, we just wait for 500 milliseconds and retry the same request. That’s helpful when we download multiple quarterly and annual statements back to back so that we can merge them into one big statement.

Next, we define a function to merge two income statement dataframes into one. We use this merge function to iterate over all 10-K and 10-Q income statements and generate one, large income statement holding all values.

The cleaning function helps us removing columns we are not interested in, for example columns holding more than 5 NaN values or columns ending with “_left”. The merge function creates a column ending with “_left” if it’s a duplicate column. We also rearrange the columns so that the most recent quarter column sits right and the oldest quarter left.

The result looks like this:

Cleaned income statement after merge

Finally it’s time to build our multi-year income statement. We iterate over all accession numbers, generate the XBRL-JSON version, create an income statement dataframe and merge the newly generated income statement with our global statement. All 10-Q and 10-K income statements are now merged into a single pandas dataframe.

The resulting income statement includes quarterly results, half-yearly, 9-month and annual results.

Final income statement after merging all 10-Q and 10-K statements into one dataframe

Looking closely we see that 4th quarter results are missing. That’s expected because a company only files three times per year a 10-Q filing and once per year a 10-K filing. It’s on us now to calculate the 4th quarter results by using the annual and 9-month results.

We start by defining a number converter function. Either we convert a given string to an integer (e.g. net income) or to a float (e.g. earnings per share).

The main function iterates over all columns. The function checks if the column represents an annual result (delta.days > 350). If it’s an annual result column, we iterate over all columns and find the column that represents 9-month results starting at the same date as the annual results column.

Once discovered, we calculate the 4th quarter results by simply subtracting the 9-month result of a US GAAP item from its annual result.

For example:
Net Income (4th Quarter) =
Net Income (Annually) — Net Income (9-Months)

This doesn’t work for Earnings Per Share and Number Of Shares Outstanding. That’s why we calculate such values using the newly calculated values.

For example:
EarningsPerShareBasic (4th Quarter) =
NetIncomeLoss (4th Quarter) / WeightedAverageNumberOfSharesOutstandingBasic (4th Quarter)

In the last step we remove all columns not representing quarterly results.

And the final income statement including all quarterly results since 2017 looks excellent.

Final income statement for Apple with all quarterly results since 2017

Plotting

Let’s have fun and plot revenues, net income and earnings per share.

Plotting revenue and net income shows a clear repeating pattern of seasonality. Apple’s revenue and net income always peaks in the 4th quarter (September to December) of each year as Christmas sales drives up demand for new iPhones and Macs. A drop always follows in the 1st quarter of each year (January to March). Revenue and net income are increasing year-over-year — nothing unexpected here.

A look at Earnings per Share shows a different picture.

Quarterly Earnings Per Share — Apple — 2017 to 2021

--

--