Adjusting for Inflation When Analysing Historical Data with Python

Mike Erb
Analytics Vidhya
Published in
6 min readMar 16, 2020
Photo by Carlos Muza on Unsplash

I recently finished a project to determine what the best performing movies at the box office are and make recommendations based on my findings. The definition of best was open to interpretation, it could have been profits, popularity, accolades, perceived social good, etc. or any combination those attributes. I chose to focus on profits because I believed that would be the real interest of the stakeholders.

Part of my analysis of movie profits involved the production budgets and profits of movies over the last 20 years.

While many datasets do not contain monetary data, if you find yourself analyzing a dataset with historical monetary data like I was, you will want to adjust those values for inflation. Before I show you how I adjusted the data for inflation, let’s look at the question of why bother adjusting for inflation.

Why Adjust for inflation?

Let’s start with defining a couple of terms you need to know when dealing with inflation.

Inflation: A measure of the rate of change for the average price level of a collection of goods and services that increases over a period of time. A dollar amount from a point in the past will not have the same buying power as the same dollar amount from today because of inflation.

Nominal/Current dollars: Unadjusted dollar amounts that were recorded as they occurred. In our situation, when the movie was in the box office.

Real/Constant dollars: Dollar amounts that have been adjusted for inflation so that all the historical dollar data we are dealing with is at a constant level in regards to purchasing power.

Your source dataset is likely to only include data in nominal dollars. To illustrate how nominal vs. real data will look, here are plots of the mean production costs for all the movies in a given year over the dataset.

Nominal Mean Production Budget for Movies for the years 2000 to 2019
Real Production Budget for Movies for years 2000 to 2019

If the mean production costs are not adjusted for inflation, it appears that they are tending to increase year to year, while after adjusting for inflation we can see that they are fairly stable.

So how do we adjust for inflation?

Now that we have looked at why adjusting for inflation is important, just how do we do it?

A common method is to adjust for inflation using the Consumer Price Index (CPI). The basic formula is:

adjusted_value = (old_value * cpi_current) / cpi_old

Taking an example from the movies dataset, Avatar had a production budget in 2009 of 425 million dollars and after adjusting for inflation to 2019 dollars the real production budget is over 506 million dollars as shown in the formula below.

budget_2009 = 425000000
cpi_2009 = 214.537
cpi_2019 = 255.657
budget_2019 = (budget_2009 * cpi_2019) / cpi_2009budget_2019 = 506459142.25

The inflation adjustment of close to 100 million dollars is very significant, reinforcing the need to calculate the real values when comparing data from movie to movie.

The CPI Library

Now that we understand the need for adjusting for inflation and how to adjust for inflation, the next thing I thought I would need to do was write some Python code to do the calculations and gather the CPI data. But before I did that, I wondered if someone had already done it which would save me time. So I did a Google search and found that someone had not just written some example code but had published a library appropriately called the CPI Library.

In order to use the CPI library, first you will need to install the it. The github repo can be found here. I installed it using pip.

$ pip install cpi

In your editor or notebook, import the library

import cpi

Then to get an inflation adjusted (real) value

adjusted_value = cpi.inflate(nominal_value, year_for_nominal_value)

Using the values for Avatar from above, it would look like this:

budget_2009 = 425000000
budget_year = 2009
budget_2019 = cpi.inflate(budget_2009, budget_year)budget_2019 = 506459142.25

That’s easy enough, time make the inflation adjustments for all my data.

Before I do that, their are a couple things to note:

The CPI dataset on your machine may become out of date. If it does, when you run import cpi you will get a warning telling you to update your data. The CPI Library has a built in update function. Just run to update your data and you’ll be all set.

cpi.update()

The CPI library uses the ‘All Urban Consumers’ price index (CPI-U) to calculate inflation by default, which is what I wanted to use for the movies dataset I was analysing. If you need to use a different price index, some of them are built into the library; check the readme here. But there are many others published by The Bureau of Labor Statistics, and you may be able to fork the CPI library to add that capability, or come up with an alternate method of adjusting for inflation.

Adjusting for inflation with Pandas

I needed to adjust all the monetary values in my dataset for inflation, and my data was stored in a Pandas DataFrame, so I wrote the following code to add columns to my data frame with inflation adjusted values.

First, I defined a function to make the inflation adjustments. While it just returns the results of an apply() function on the dataset, writing the function made the upcoming inflation adjustment code cleaner and the code more maintainable overall.

def inflate_column(data, column):
"""
Adjust for inflation the series of values in column of the
dataframe data
"""
return data.apply(lambda x: cpi.inflate(x[column],
x.release_year), axis=1)

Next, I created and populated new columns in my dataframe with inflation adjusted values for the production budget and worldwide gross profit.

# adjust the production budget 
movies_df['real_production_budget'] = inflate_column(movies_df, 'production_budget')
#adjust the worldwide gross profits
movies_df['real_worldwide_gross'] = inflate_column(movies_df, 'worldwide_gross')

Now that I had inflation adjusted values for the production budget and gross profits, I could add a column to my dataframe with the calculated real worldwide net profits for the movies in my dataset and proceed with my analysis.

# calculate the real worldwide net profit
movies_df['real_profit'] = movies_df.apply(lambda x: x.real_worldwide_gross - x.real_production_budget, axis=1)

All my data had now been adjusted for inflation and I was ready to proceed with my analysis.

An Example Visualization From My Analysis Using the Real Values

Movie Production Budget vs. Net Profit for movies using real dollar amounts

The above scatter plot shows the real production budget vs. real net profit for all the movies in my dataset. In the upper right hand corner is Avatar, the most extreme outlier in the movie dataset. The purpose of the visualization was to look at the correlation between the production budget and net profit. It turns out that both from looking at the visualization and calculating the correlations for different quantiles, that only Avatar and the other extreme outliers on the right hand side of the visualization show even a moderate correlation between the two.

My Project

Thanks for reading this post, you can check out my entire project and its findings on github here.

--

--

Mike Erb
Analytics Vidhya

Data Scientist with a background in Computer Science and as an Entrepreneur in the Bike industry — Based in Ithaca, NY