Exploring Revenue Forecasting in E-commerce: The Power of Simulations

Objective: In this blog post I will explore the use of simulations for a typical e-commerce use case.

Gaurang Mehra
Operations Research Bit
4 min readMay 10, 2024

--

We will use simulations to forecast the monthly revenue for an e-commerce company.

The typical business flow for an e-commerce company can be represented by the block diagram below.

Fig 1.1 Block Diagram of e-commerce flow

The widest part of the funnel is impressions. An impression is recorded when a customer sees a web ad. Some prospective customers who see the ad or search result click through to the website. Of the customers who click through only a few sign up and of the customers who sign up only a few purchase.

We can model this flow in the following way

  • We can find the historical number of avg monthly impressions. In this case we assume 100K. We can model the impressions as a Poisson random variable with the lambda parameter set to the average number of impressions (100K in this case). Poisson distribution is a good choice here because the number of impressions is a discrete number and impressions are independent of each other.
  • Clicks and sign-ups both have binary outcomes and so can be modelled using the binomial distribution. For clicks the number of trials n is the number of impressions from the previous step and the probability p is the historical click through rate (ctr). For sign-ups the n is defined by the number of clicks(from previous step) and probability p is the sign-up rate (str)
  • Purchases is also a binary yes/no event defined by the binomial distribution with n as the number of sign-ups in the previous step and p as the purchase probability again something we can get from historical data
  • The value of the purchases (ticket size) can be defined as a normal distribution with mean = historical ticket size and std deviation = historical std deviation. In this case we have used a normal distribution in some cases other distributions might be a better fit.
  • We will run this flow a thousand times each time storing the result of revenue and this will give us some idea of the variability of revenue

Now lets model this in Python

# Import basic libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Import interactivity components for later
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

def get_signups(avg_impressions,ctr,str_):
impressions = np.random.poisson(avg_impressions)
clicks = np.random.binomial(n=impressions,p=ctr)
signups = np.random.binomial(n=clicks,p=str_)
return(signups)

In this first part we write a function that models the first part of the flow until signups. The function takes 3 arguments avg_impressions, ctr (for click through rate) and str_(for sign up rate). This function first generates a number of impressions based on the avg_impressions parameter and following the Poisson distribution. Then clicks and sign ups are generated using binomial distributions. For clicks the number of trials n is the number of impressions and so on. The function returns the number of sign ups.

def get_revenue(signups,pur_rate,avg_tkt,std_tkt):
purchases = np.random.binomial(n=signups,p=pur_rate)
purchase_vals = np.random.normal(avg_tkt,std_tkt,size=purchases)
revenue = np.sum(purchase_vals)
return(revenue)

This second part of the code models the flow from signups to getting the revenue. The purchases follow a binomial distribution with n as the number of sign ups and p as the historical purchase rate. Each of the purchases is assigned a purchase value according to the normal distribution with the mean = avg ticket size and std dev = historical std deviation. We then sum all the purchase values to get monthly revenue

@interact(avg_impressions=[100000,120000,150000],ctr=[0.01,0.025,0.05],str_=[0.1,0.15,0.2])
def run_simulation(avg_impressions,ctr,str_,pur_rate=0.3,avg_tkt_size=100,std_dev_tkt=20):
rev_tot = []
for i in range(1000):
signups = get_signups(avg_impressions,ctr,str_)
rev = get_revenue(signups,pur_rate,avg_tkt_size,std_dev_tkt)
rev_tot.append(rev)
rev_tot = np.array(rev_tot)

plt.hist(rev_tot)
plt.show()
print("Median revenue is ",np.median(rev_tot))

We create a function called run_simulation. This is a wrapper function inside which we call the get_signups and get_revenue functions. We set up this function to be interactive using the interact decorator and vary the avg_impressions, ctr, str and all other parameters. We then run the simulation 1,000 times and get a distribution of revenue. We can change the avg impressions for the Poisson distribution driving impressions using the interactive dropdown. We can do the same for ctr.

Fig 1.3 Effect of changing average click through rate (ctr) on revenue

In the figure above we can see that changing the average click through rate from 1% to 2.5% has a massive impact on the median revenue increasing it from ~$2,900 to ~$7,500.

Fig 1.4 Effect of increasing avg impressions

If we increase avg impressions by 50% we see that revenue goes up from ~$3,000 to ~$4,500 a 50% increase. Again we can change 1 factor or a combination of factors to show how the median or the 95% confidence interval changes.

--

--

Gaurang Mehra
Operations Research Bit

Deeply interested in Data Science, AI and using these tools to solve business problems.