Multiple Regression Model for Movie Box Office Revenue

Verica Buchanan
Human Systems Data
Published in
2 min readMar 29, 2017

This multiple regression analysis models a movie’s first year box office revenue in millions of dollars. The input variables are the following: total production costs/millions, total promotional costs/millions, total book sales/millions. Hence, knowing these three input variables one can use this to predict for example, how much the next Star Wars movie would make. Production cost of the movie and amount spent on promoting the movie are strongly correlated with first year’s revenue earnings. However, total book sales is a weaker predictor of first year’s total movie revenue. These correlation strengths can be seen in the simple scatter plot matrix. Additionally I calculated the r-vales for each and there are as follows: for total production cost R=.84, for promotional cost R=.86, and for book sales R=.23.

Figure 1. Scatter Plot Matrix plotted in R

Looking at the first row the y-axis values are the movie’s fist year’s movie revenue. Then, moving from left-to-right, the first x-axis depicts the total production cost values, second x-axis the total promotional cost values, and the third x-axis represents total book sales. All values are in millions of dollars.

I ran the same analysis in a different program and was able to add the regression line for each graph.

Figure 2. Individual scatter plots for first year’s movie revenue with regression line.

Lastly, I used Excel to create a third graph. As one can see, revenue increases as total production cost, promotional cost, and book sales go up.

Figure 3. First year’s box office revenue based on promotional cost, production cost, and book sales.

Regression Model Formula:

Y (X1)= 7.6760 + (3.6616)(X1) +(7.6211)(X2) + (0.8285)(X3)

X1 = first year box office receipts/millions

X2 = total production costs/millions

X3 = total promotional costs/millions

X4 = total book sales/millions

R-code for the analysis:

Figure 4. R-code used to develop multiple regression model and plotting the scatter plot matrix.

R-Output Files:

--

--