Using Waterfall Charts in Python to analyze Programmatic Advertising Revenue change.

James Hopper
3 min readSep 7, 2021

--

When presenting to an executive audience, one of the most powerful ways to provide insights into what drives change in year-over-year revenue performance is the Waterfall (aka Bridge) Chart.

When building Waterfall Charts for ad revenue generated on news media websites, there were three key considerations:

1.) What are the key drivers behind year-over-year (YoY) revenue performance? In this case, it was Page Views (PV), Impressions per PV (Imp/PV), and CPM. It was also important to break out by device and content type.

2.) Finding the right equation to compare year-over-year change.

  • Total YoY Revenue change: rev change driven by PV + rev change driven by Imp/PV + revenue change driven by CPM (which ties.)
  • rev change driven by PV: (PV ty — PV last year) * last year revenue per PV
  • rev change driven by Imp/PV: (Imp/PV this year — Imp/PV last year) * CPM last year * PV this year/1000
  • rev change driven by CPM: (CPM this year — CPM last year) * impressions this year/1000

3.) Will this process need to be easily reproducible? As is often the case, the answer was yes! This report would need to be generated on a weekly basis and would need to be run across several websites. Hence, we went to Python Pandas and Plot.ly for a solution.

Coding

Coding the formulas for consideration 2 above:

#Assign calculated columns
grpd['RPM'] = grpd.revenue/grpd.page_views*1000
grpd['CPM'] = grpd.revenue/grpd.filled_impressions*1000
grpd['imp/pv'] = grpd.filled_impressions/grpd.page_views
#Calculate waterfall chart values
pv_impact = (segment_ty.page_views.values - segment_ly.page_views.values)*segment_ly.RPM.values/1000
impperpv_impact = (segment_ty['imp/pv'].values - segment_ly['imp/pv'].values)*segment_ly.CPM.values * segment_ty.page_views/1000
cpm_impact = (segment_ty.CPM.values - segment_ly.CPM.values)*segment_ty.filled_impressions.values/1000
chart_data[group] = [float(pv_impact), float(impperpv_impact), float(cpm_impact)]

Iteratively generate Plotly charts by site:

# Push data into plotly formatfor market in data.market.unique():
#Create df for chart building
subset = data[data.market==market]
subset = subset[['year','device','is_slideshow','page_views','filled_impressions','revenue']]
grpd = subset.groupby(['year','device','is_slideshow'], as_index=False)
grpd = grpd.sum()

#Assign calculated columns, **code above**
#Create key column for iterating
grpd['group'] = grpd.device + ' - ' + grpd.is_slideshow

chart_data = {}
for group in grpd.group:
segment_changes = {}
segment = grpd[grpd.group==group]
segment_ly = segment[segment.year==ly]
segment_ty = segment[segment.year==ty]
#Calculate waterfall chart values, **code above**#turn results in DataFrame
chart_df = pd.DataFrame(chart_data, index=['Page Views','Impressions Per Page View','CPM'])
#calculate start and end total
ly_rev, ty_rev = grpd.groupby('year').revenue.sum().values
#flatten data so it can be plotted
flat_data = list(chart_df.values.flatten())
#add start and end totals to flattened data
flat_data.insert(0,ly_rev)
flat_data.append(ty_rev)
#format flattened data as currency
flat_data_txt = ["${:,.0f}".format(x) for x in flat_data]
#create waterfall chart
fig = go.Figure(go.Waterfall(
orientation = "v",
measure = ['absolute','relative','relative','relative','relative','relative','relative','relative','relative','relative','relative','relative','relative','absolute'],
x = [['Total','Page Views','Page Views','Page Views','Page Views','Impressions Per Pageview','Impressions Per Pageview','Impressions Per Pageview','Impressions Per Pageview','CPM','CPM','CPM','CPM', 'Total '],
[ly,'Desktop Non-SS','Desktop-SS','Mobile Non-SS','Mobile SS','Desktop Non-SS','Desktop-SS','Mobile Non-SS','Mobile SS','Desktop Non-SS','Desktop-SS','Mobile Non-SS','Mobile SS',ty]],
textposition = "outside",
text = flat_data_txt,
y = flat_data,
connector = {"line":{"color":"rgb(63, 63, 63)"}},
))
fig.update_layout(
title = '{} YoY Revenue Change'.format(market),
showlegend = False,
width=850, height=600,
uniformtext_minsize=9, uniformtext_mode='hide'
)

if not os.path.exists('Automated Waterfall Charts'):
os.mkdir('Automated Waterfall Charts')
fig.write_image('Automated Waterfall Charts/{}.pdf'.format(market))

Merge individual charts into a single pdf:

#Merge individual charts into single PDF
def merger(output_path, input_paths):
pdf_writer = PdfFileWriter()

for path in input_paths:
pdf_reader = PdfFileReader(path)
for page in range(pdf_reader.getNumPages()):
if page == 0:
pdf_writer.addPage(pdf_reader.getPage(page))

with open(output_path, ‘wb’) as fh:
pdf_writer.write(fh)
paths = [‘Automated Waterfall Charts\\All Web Sites — Broken Out.pdf’,
‘Automated Waterfall Charts\\Site1.pdf’,
‘Automated Waterfall Charts\\Site2.pdf’,
‘Automated Waterfall Charts\\Site3.pdf’,
‘Automated Waterfall Charts\\Site4.pdf’,
‘Automated Waterfall Charts\\Site5.pdf’]
merger(‘Automated Waterfall Charts/{} MTD Waterfall Charts — Broken Out.pdf’.format(pd.to_datetime(‘now’).normalize().strftime(‘%Y-%m-%d’)), paths)

In the end, six different reports are autogenerated in PDF, one for each site as well as a single pdf containing all sites!

The example below shows that we can estimate an exact dollar amount to the change driven by each subcategory.

Looking at the breakdown of site 1, you can see that the losses in CPM were more than offset by the gains in page views and impressions per page view, resulting in a lift in total revenue.
Based on the waterfall chart above, we can determine that growth in revenue from page views and impressions per pageview were able to more than offset the steep declines in CPM.

For the full code and source data, please visit https://github.gatech.edu/jhopper6/waterfall_demo.

--

--

James Hopper

Hands-on data analytics leader. When I encounter a problem, I first do research. If I can’t find a clear solution, I write my own solution here.