Tips for teaching regression analysis part 2: interaction effects

This post is a direct follow-up of “Tips for teaching regression analysis to those not well versed in math (that also help the mathematically inclined),” which you find here. I begin where my last post ended, namely at the introduction stage of multiple regression. As before, my pieces of advice are based on 30 years of teaching regression analysis and my textbooks on the subject (see here and here). The topic in today’s post is the effective teaching of interaction effects, and, as before, my primary goal is to reach students not well versed in math. For this reason, I draw heavily on graphs. I continue using my data on large, second-hand cabin cruisers sold in Norway. BTW, that’s another tip in passing: use data sets covering many aspects of regression modeling. In this way, students can concentrate on one phenomenon at the time — recreational boats in the present case.

Preamble: multiple regression results presented in a graph

Our first dependent variable, y, is the sales price of a boat. The multiple regression model contains two x-variables: the length of the boat, x1, and whether the boat was sold privately (coded 0) or in a boat shop (coded 1), x2. (Tip in passing! There is no need to have more than two x-variables when explicating multiple regression, at least not in the beginning of a course.) As argued in my last post, graphs trump equations seven days a week when it comes to teaching how (multiple) regression works, and Figure 1 is thus the graphed results of the regression equation:

Price = -381.70 + 14.94 × Length + 52.09 × Sold in shop.

Figure 1 speaks for itself. Larger boats unsurprisingly cost more than smaller boats, and boats sold in specialized boat shops cost more than boats sold privately — also perhaps unsurprisingly. Using this as backdrop, it is straightforward to “put the equation on the graph” as in explicating the slope (i.e., regression coefficient from here on) of the boat length variable: 14.94. That is, a 50-feet cabin cruiser costs roughly 15,000 Euro more than a 49-feet cabin cruiser, controlling for the type of sale variable. It is likewise straightforward to explain the regression coefficient of the type of sale variable as the vertical distance between the regression lines: 52.09. That is, boats sold in specialized boat shops cost about 52,000 Euro more than boats sold privately, controlling for boat length.

Explaining and visualizing interaction effects: a continuous x1 and a dummy x2

I always introduce the teaching of interaction effects by emphasizing that the parallel regression lines in Figure 1 is an inherent assumption in the plain-vanilla multiple regression model. Thereafter I ask: Are these regression lines necessarily parallel in real-life too? Then, without further ado, I present the equivalent regression model relaxing the parallel assumption, namely the so-called interaction model. Figure 2 takes care of this.

Figure 2 also speaks for itself. The regression line is steeper for boats sold in specialized boat shops. That is, although boat length is an important determinant of boat price no matter what, it matters slightly more for boats sold in shops. At this point, and not before this point, I introduce the regression equation yielding the results in Figure 2, namely:

Price = -297.00 + 12.67 × Length — 96.69 × Sold in shop + 3.85 × (Length × Sold in shop).

In this equation, I first informally focus on the regression coefficient for the interaction variable: 3.85. This coefficient, I tell my students, is always the key to unlock interaction effects. In our scenario, this coefficient is the difference between the regression coefficient of boat length for boats sold privately (coded 0) and boats sold in a boat shop (coded 1). And since the coefficient is positive, boats sold in shops have a steeper upward-sloping regression line than boats sold privately. This is the key message!

Secondly, and only secondly, I derive the regression coefficient of boat length for boats sold privately (coded 0). For these boats, the regression equation becomes:

Price = -297.00 + 12.67 × Length — (96.69 × 0) + (3.85 × (Length × 0)) →

Price = -297.00 + 12.67 × Length — (0) + (0) →

Price = -297.00 + 12.67 × Length.

In contrast, for boats sold in shops (coded 1), the regression equation becomes:

Price = -297.00 + 12.67 × Length — (96.69 × 1) + (3.85 × (Length × 1)) →

Price = -297.00 + 12.67 × Length — (96.69) + (3.85 × Length) →

Price = -393.69 + 12.67 × Length + 3.85 × Length →

Price = -393.69 + 16.52 × Length.

In plain words: The regression coefficient of boat length is 12.67 for boats sold privately and 16.52 for boats sold in shops. The difference is, yes, 3.85!

At this point I often get the question: How do we know which model/figure to trust? And if I don’t, I ask it myself … Stated differently: Are parallel or non-parallel regression lines more in sync with the data and hence with real life? To answer this, I draw attention to the magnitude of the regression coefficient for the interaction variable and its significance level. The former should be larger than zero (3.85 is larger than zero) and statistically significant (it is, but it is not shown). In our present case, both criteria are thus met. At this point, and not before, I write up the definition of an interaction effect: the magnitude of x1’s effect (i.e., boat length’s effect) on y (price) is contingent on (the value) of x2 (i.e., type of sale). From this exercise, it is a small step to the interaction of two continuous x-variables, which I always teach after I have taught the continuous-dummy interaction above. Here we go …

Explaining and visualizing interaction effects 2: a continuous x1 and a continuous x2

Figure 3 presents the results of a multiple regression model with two continuous x-variables as well as their interaction effect. (We can skip the non-interaction model this time.) That is, the y is boat price (as above), and the x-variables of concern are boat length (as above) and boat age. (Remember, these are second-hand boats.) The figure speaks for itself given that we already have established that non-parallel regression lines imply an interaction effect being present: The older the boat, the less steep is the regression line for the length variable. Again: the magnitude of x1’s effect (i.e., boat length’s effect) on y (price) is contingent on (the value) of x2 (i.e., boat age).

The regression equation yielding the graph in Figure 3 is:

Price = -555.71 + 21.79 × Length + 19.15 × Age — 0.74 × (Length × Age).

The way to read this equation could be something like this: The negative interaction effect (i.e., -0.74) suggests that the regression coefficient for length becomes smaller with increasing values for age — or, the other way round, that the regression coefficient for age becomes smaller with increasing values for length. Both interpretations work equally well; they are mirror images of each other.

The next stop is the dummy-by-dummy interaction. But since such an interaction offers nothing new, we drop it at this point. Coming up is the analyses of more x-variables and non-linearities (see here and here), before moving on to tests of significance (see here). But that’s also something for another day.

Takeaways

Interaction effects abound in regression-based modeling in the social and behavioral sciences. Yet such effects are hard to understand (and thus to teach) if focusing only on the output from regression tables, especially for those not well versed in math. In this post, I have thus used graphs to illustrate how we still can teach interaction effects within a regression framework without (too) much emphasis on equations.

About me

I’m Christer Thrane, a sociologist and professor at Inland University College, Norway. I have written two textbooks on applied regression modeling and applied statistical modeling. Both are published by Routledge, and you find them here and here. I am on ResearchGate here, and you also reach me at christer.thrane@inn.no

--

--

Christer Thrane (christer.thrane@inn.no)

I am Christer Thrane, a sociologist and professor at Inland University College, Norway. You find me on ResearchGate. I do lots of regression modeling ... :-)