Combined “marginsplots” for Regression Analysis in Stata

Have one predictor variable of interest, but several different outcome variables? Are all the outcome variables scaled the same way? If so, you can combine multiple “marginsplots” together into a single graph. This tutorial shows you how to do it in Stata using the “combomarginsplot” package.

John V. Kane
The Stata Gallery
Published in
15 min readDec 31, 2023

--

A common way of visually presenting regression results in Stata is by creating a “marginsplot”.

In contrast to a “coefplot” (which I cover in detail here), a marginsplot enables the reader to see the predicted values of the outcome/dependent variable across values of a predictor variable. This is especially helpful when examining nonlinear relationships, but is regularly used for linear relationships as well.

And because these graphs are generated from an underlying regression model, they can take into account any number of control variables one wishes to adjust for.

But what if we have multiple outcomes of interest? Or, what if we have the same predictor variable and outcome variable of interest, but several different model specifications?

We could of course just make a bunch of separate marginsplots. However, this might be cumbersome for a reader to process. It also doesn’t allow for easy comparisons between the different graphs.

Fortunately a terrific alternative exists. We can use the “combomarginsplot” package, created by Nicholas J.G. Winter, to combine all of the separate marginsplots into one single graph.

But before exploring how -combomarginsplot- works, let’s first understand the conventional -marginsplot- command.

First Things First

A couple of quick notes before we get started:

  1. Graphs will mostly use schemes from “schemepack” by Asjad Naqvi. To install, simply execute:
ssc install schemepack, replace

You can then explore all your awesome new schemes by executing the following:

graph query, schemes

2. Note: All graphs use “AbelPro-Regular” font, which can be downloaded here. For details on installing/using fonts that are not native to Stata, see here.

3. The -combomarginsplot- package can be installed using the following code:

ssc install combomarginsplot, replace

Creating a Regular Marginsplot

To provide an example of the regular marginsplot, we can first use some 2020 data from the American National Election Study. I’ve taken a random sample of ~1000 respondents from this data set and placed the necessary data here so that you can replicate all of the analyses and graphs below.

In this example, our outcome variable will be a measure of how much each respondent trusts political information from Facebook (we’ll call this measure “Trust”). This variable is measured on a 5-point scale ranging from 1= “Not at all” to 5= “A great deal”.

Our main predictor variable of interest will be the respondent’s age. Our models will also control for the respondent’s gender and racial identification. (To keep things simple, we’ll use OLS regression here despite the limited number of values that this outcome measure can take on.)

Our multiple regression model is therefore as follows:

reg trustfb_clean Age_respondent_clean i.race_clean i.Female

The next step is that we need Stata to calculate predicted values of the outcome (Y) across values of Age (X). Here’s how we can do it:

margins, at(Age_respondent_clean=(18(2)80)) level(95)

What this code does is ask for predicted values of Y (based upon the model’s results) at various values of Age — specifically, starting at 18 (the lowest age in the sample) and increasing by increments of 2 all the way up to 80 (the highest value of Age in the sample).

Notice, too, that we are specifying the confidence interval level here, which is important for how the subsequent graph is drawn. Here we will keep it at the conventional 95%, but note that it can be changed to any values you like (e.g., 90% or 99%).

Now that Stata has calculated all the estimates, we can create our marginsplot. This can be done simply by executing marginsplot, but I’ve added some additional options here to make the graph a bit more informative and aesthetically appealing:

marginsplot, /// main marginsplot command
scheme(white_jet) /// change graph scheme
plotopts(lwidth(medthick) lcolor(%70) msize(medlarge) mcolor(cyan%75) mlcolor(midblue)) /// changes fit line and marker features
recastci(rarea) ciopts(fcolor(midblue%40) lcolor(cyan)) /// options for CI area opacity and line colors
title("Age and Trust in Facebook as a News Source", box color(white) span fcolor(black) bexpand) /// makes title, and spans it across graph (looks nicer)
subtitle("US Adults (2020)", box color(white) span fcolor(black) bexpand) /// makes subtitle, and spans it across graph (looks nicer)
xsize(6.5) ysize(4.5) /// makes width of graph 6.5 inches, and height of graph 4.5 inches
xlab(, glcolor(gs15) glpattern(solid)) /// adds solid light gray vertical grid lines
ylab(, glcolor(gs15) glpattern(solid)) /// adds solid light gray horizontal grid lines
ytitle(Perceived Trustworthiness of News Source) /// title of y-axis
note("Note: Model controls for race & gender; n=~1000", size(vsmall) span) // adds note; "span" places in left corner //

This produces the following graph:

A marginsplot showing the relationship between Age and perceived trustworthiness of Facebook news, controlling for race and gender identity.

Notice how we can see not only that the slope is negative (as age increases, trust in political news from Facebook decreases (holding race and gender constant)), but also the predicted levels of Y across the full range of X values. For example, we see that even among the youngest respondents, the predicted level of Trust is still quite low (about 1.7 on a 5-point scale). For respondents who are middle-aged, Trust is predicted to go below 1.6, while among the elderly, the predicted level of Trust goes below 1.5.

Creating a combomarginsplot

The marginsplot is thus an extremely informative way of presenting regression results, particularly when you wish to emphasize the role of one independent/predictor variable (in this case Age).

But now suppose you have multiple outcomes you are interested in, and that these outcomes are all scaled the same way. Continuing with the present example of age and trust in media, imagine we have trust in not just Facebook news, but also in news from Twitter, Fox News, and MSNBC. We therefore have the same main predictor variable of interest (Age), but four different outcome variables, each of which uses the same 5-point scale to measure trust.

Again, rather than creating four different marginsplot graphs, we can take advantage of the fact that all the outcome measures are scaled the same way and create a single combomarginsplot!

This can be done in the following steps:

  1. Run the model for one of your outcome variables
  2. Calculate the margins (just as we did above) and use the -saving( )- option to save the predicted values
  3. Repeat steps 1 and 2 for each outcome
  4. Use the combomarginsplot command with your saved model names

Let’s run our Facebook model again, but this time save the predicted values using -saving(lmodel_fb, replace)-. This saves the results as “lmodel_fb”. We’ll need this saved result for our eventual graph.

reg trustfb_clean Age_respondent_clean i.race_clean i.Female
margins, at(Age_respondent_clean=(18(2)80)) saving(lmodel_fb, replace)

We’ll now do the same thing for our Twitter, Fox News, and MSNBC outcomes, saving the results of each model under a different name:

# Twitter model
reg trusttw_clean c.Age_respondent_clean i.race_clean i.Female
margins, at(Age_respondent_clean=(18(2)80)) saving(lmodel_twitter, replace)

# Fox News model
reg trustfox_clean c.Age_respondent_clean i.race_clean i.Female
margins, at(Age_respondent_clean=(18(2)80)) saving(lmodel_fox, replace)

# MSNBC model
reg trustmsnbc_clean Age_respondent_clean i.race_clean i.Female
margins, at(Age_respondent_clean=(18(2)80)) saving(lmodel_msn, replace)

Now we can create our combomarginsplot! Because multiple models will be shown, Stata will generate a legend. This means we need to tell Stata what the legend labels should say. A bare-bones combomarginsplot that only features these labels (and omits the auto-generated title) can be created with the following code. Very important: notice that the labels (second line of code) are in the exact same order as the saved results (first line of code):

combomarginsplot lmodel_fb  lmodel_twitter lmodel_fox lmodel_msn, /// lists the different models
labels("Facebook" "Twitter" "Fox News" "MSNBC") /// specifies the labels we want in the legend
title("") // omits the title

This produces the following graph (note that this uses Stata 18’s new “stcolor” scheme):

A bare-bones combomarginsplot resulting from four separate multiple regression models (scheme= “stcolor” from Stata v.18).

This graph shows the results of all four models at once. Not only that, it helps to make several additional results apparent. For example, after adjusting for gender and race, social media-based political information is always less trusted than television-based news, regardless of age. Second, age is negatively related to social media-based news yet positively related to television-based news, with MSNBC having a slightly steeper slope than Fox News.

Notice how details like these become much clearer when all models appear in the same graph?

Adding More Options

Again, this was a fairly bare-bones combomarginsplot. There are, of course, many options we can add that stand to improve the informativeness and aesthetic appeal of this graph.

For example, we can adjust the line widths and color opacity, marker features, rendering of the confidence intervals, scheme, legend options, title, subtitle, note, x-axis and y-axis features, gridlines, and graph dimensions like so:

combomarginsplot lmodel_fb  lmodel_twitter lmodel_fox lmodel_msn, /// lists the different models
plotopts(lwidth(medthick) lcolor(%70) msize(medium) mlcolor(yellow) mcolor(%75)) /// changes fit line and marker features
labels("Facebook" "Twitter" "Fox News" "MSNBC") /// specifies the labels we want in the legend
legend(ring(1) size(small) pos(3) col(1) margin(small) title(News Source, box color(white) fcolor(gs6) margin(zero) size(medsmall))) /// legend options
graphregion(margin(medsmall)) /// specifies amount of space between graph and outside perimeter
plotregion(margin(medlarge)) /// adds more space between graph and inside perimeter
recastci(rarea) ciopts(fcolor(%40)) /// fills in CI area around each line at 40% opacity
scheme(rainbow) /// changes scheme
ylab(1(.5)3, glcolor(gs10) glpattern(solid)) /// specifies y-axis range and changes gridlines
xlab(, glcolor(gs10) glpattern(solid)) /// specifies x-axis range and changes gridlines
xsize(6.5) ysize(4.5) /// makes width of graph 6.5 inches, and height of graph 4.5 inches
xtitle("Age of Respondent") /// titles x-axis
ytitle(Perceived Trustworthiness of News Source) /// titles y-axis
title("Age and Trust in News Sources", box color(white) span fcolor(black) bexpand) /// makes graph title
subtitle("US Adults (2020)", box color(white) span fcolor(black) bexpand) /// makes subtitle
note("Note: Models control for race & gender; n=~1000", size(vsmall) span) // adds note; "span" places in left corner

This produces the following graph:

A combomarginsplot resulting from four separate multiple regression models (with more aesthetic options than the previous graph).

Using combomarginsplot for Non-linear Models

Stata’s marginsplots become even more useful when our main predictor has a nonlinear relationship with the outcome measure (because regression coefficients alone don’t easily enable us to visualize the nature and degree of nonlinearity).

Let’s create a combomarginsplot based upon models that allow for a nonlinear relationship between Age and Trust in various media sources. (Note: this is mainly for the purpose of illustration; the non-linear specifications aren’t really necessary in this particular case.)

We’ll first run the models, each specificying a quadratic term (i.e., both Age and Age-squared as predictors) in addition to the same controls for gender and race. We’ll then save the predicted values after each model under a unique name (e.g., “qmodel_fb”):

# Facebook model
reg trustfb_clean c.Age_respondent_clean##c.Age_respondent_clean i.race_clean i.Female
margins, at(Age_respondent_clean=(18(2)80)) saving(qmodel_fb, replace)

# Twitter model
reg trusttw_clean c.Age_respondent_clean##c.Age_respondent_clean i.race_clean i.Female
margins, at(Age_respondent_clean=(18(2)80)) saving(qmodel_twitter, replace)

# Fox News model
reg trustfox_clean c.Age_respondent_clean##c.Age_respondent_clean i.race_clean i.Female
margins, at(Age_respondent_clean=(18(2)80)) saving(qmodel_fox, replace)

# MSNBC model
reg trustmsnbc_clean c.Age_respondent_clean##c.Age_respondent_clean i.race_clean i.Female
margins, at(Age_respondent_clean=(18(2)80)) saving(qmodel_msn, replace)

Now we can create our combomarginsplot. The code below features mostly the same options as the example above, just a few stylistic changes:

combomarginsplot qmodel_fb  qmodel_twitter qmodel_fox qmodel_msn, /// lists the different models
plotopts(lwidth(medthick) lcolor(%70) msize(large) mlcolor(yellow) mcolor(%75)) /// changes fit line and marker features
labels("Facebook" "Twitter" "Fox News" "MSNBC") /// specifies the labels we want in the legend
legend(ring(1) size(small) pos(3) col(1) margin(small) title(News Source, box color(white) fcolor(gs6) margin(zero) size(medsmall))) /// legend options
graphregion(margin(medsmall)) /// specifies amount of space between graph and outside perimeter
plotregion(margin(medlarge)) /// adds more space between graph and inside perimeter
recastci(rarea) ciopts(fcolor(%40)) /// fills in CI area around each line at 40% opacity
scheme(black_tableau) /// sets scheme
ylab(1(.5)3, glcolor(gs4) glpattern(solid)) /// y-axis and horizontal gridline features
xlab(, glcolor(gs4) glpattern(solid)) /// x-axis and vertical gridline features
xsize(6.5) ysize(4.5) /// makes width of graph 6.5 inches, and height of graph 4.5 inches
xtitle("Age of Respondent") /// makes x-axis title
ytitle(Perceived Trustworthiness of News Source) /// makes y-axis title
xscale(lcolor(white) lwidth(medium) line) /// makes x-axis line thicker and white
yscale(lcolor(white) lwidth(medium) line) /// makes y-axis line thicker and white
title("Age and Trust in News Sources") /// makes title
subtitle("US Adults (2020)", span) /// makes subtitle
note("Note: Models control for race & gender; n=~1000", size(vsmall) span) // adds note; "span" places in left corner

This code produces the following combomarginsplot:

A combomarginsplot that allows for a non-linear relationship between Age and Trust.

By allowing the relationship between Age and Trust to be nonlinear, we can see a few differences with the regular linear models above. For example, the Fox News model shows a fair amount of nonlinearity, with most of the increase in trust coming after 60 years old (and staying fairly flat until then). We can also see that the Facebook model is noticeably less linear than the Twitter model.

Customizing the Look of Each Fitted Line

So far we’ve let the schemes largely determine how our fitted lines look (e.g., the colors of the lines have been solely determined by the scheme).

A nice feature of combomarginsplot is that we can fully customize the look of each individual line. That is, anything having to do with each model’s markers, fit line, and/or confidence interval can be modified to our liking.

For regular marginsplots, we use the following options to accomplish this:

  • plotopts( ) : controls look of lines and markers
  • ciopts( ) : controls the look of the confidence intervals

But for combomarginsplots, the key trick is to tell Stata which model we’re talking about when we specify these options. We do this simply by specifying plot#opts() and ci#opts(), where “#” refers to the number of the model as it appears in the list of models when we run our combomarginsplot code.

For example:

  • plot2opts( ) would allow us to customize the markers and lines for the second model in our list of models
  • ci3opts( ) would allow us to customize the confidence intervals for the third model in our list of models

Knowing this syntax, we can run the following code to customize each of the four lines from our regression models:

combomarginsplot qmodel_fb  qmodel_twitter qmodel_fox qmodel_msn, /// lists the different models
plot1opts(lwidth(medthick) lcolor(lime) msize(large) mlwidth(vthin) mlcolor(lime) mcolor(gs16)) /// changes fit line and marker features
plot2opts(lwidth(medthick) lcolor(lavender) msize(large) mlwidth(vthin) mlcolor(lavender) mcolor(gs16)) /// changes fit line and marker features
plot3opts(lwidth(medthick) lcolor(red) msize(large) mlwidth(vthin) mlcolor(red) mcolor(gs16)) /// changes fit line and marker features
plot4opts(lwidth(medthick) lcolor(cyan) msize(large) mlwidth(vthin) mlcolor(cyan) mcolor(gs16)) /// changes fit line and marker features
recastci(rarea) /// fill in CI area rather than having individual CIs drawn for each marker
ci1opts(fcolor(lime%55) lcolor(lime) lwidth(medium)) /// fills in CI area around each line at 55% opacity
ci2opts(fcolor(lavender%55) lcolor(lavender) lwidth(medium)) /// fills in CI area around each line at 55% opacity
ci3opts(fcolor(red%55) lcolor(red) lwidth(medium)) /// fills in CI area around each line at 55% opacity
ci4opts(fcolor(cyan%55) lcolor(cyan) lwidth(medium)) /// fills in CI area around each line at 55% opacity
labels("Facebook" "Twitter" "Fox News" "MSNBC") /// specifies the labels we want in the legend
legend(ring(1) pos(3) col(1) title(News Source, box color(white) fcolor(gs6) size(medsmall))) /// places legend outside graph, at bottom, in three columns
graphregion(margin(medsmall)) /// specifies amount of spacee between graph and outside perimeter
plotregion(margin(medlarge)) /// adds more space between graph and inside perimeter
scheme(white_jet) /// makes scheme "black_tableau"
ylab(1(.5)3, nogrid) /// specifies y-axis range; removes horizontal gridlines
xlab(, nogrid) /// removes vertical grid lines
xsize(6.5) ysize(4.5) /// makes width of graph 6.5 inches, and height of graph 4.5 inches
xscale(lcolor(sky) lwidth(medthick) line) /// colors/enlarges x-axis line
yscale(lcolor(sky) lwidth(medthick) line) /// colors/enlarges y-axis line
xtitle("Age of Respondent") /// specifies x-axis title
ytitle(Perceived Trustworthiness of News Source) /// specifies y-axis title
title("Age and Trust in News Sources", box color(white) span fcolor(black) bexpand) /// makes title
subtitle("US Adults (2020)", box color(white) span fcolor(black) bexpand) /// makes subtitle
note("Note: Models control for race & gender; n=~1000", size(vsmall) span) // adds note; "span" places in left corner

Notice in the code that lines 2–5 customize the lines and markers for the four models, respectively, while lines 7–10 customize the confidence intervals for these four models.

We then get the following graph:

A combomarginsplot with individually customized lines.

Using combomarginsplot for Binary Outcome Models

Finally, another great feature of combomarginsplots is that they are super helpful when we have several different binary outcomes of interest. This is because, despite having different outcome measures, the y-axis will always display a predicted probability that will range between 0 and 1 (assuming we’d use a logistic or probit regression model).

In the same dataset referenced above are variables from the UN’s Multi-Indicator Cluster Survey (MICS). Specifically, it is a large sample of Pakistani women (located in Punjab Province) who were asked about their age, educational attainment, geographic location (rural or urban), and access to various forms of technology (owning a mobile phone, having ever used a computer, and having ever used internet).

Suppose we were interested in studying the degree to which these women’s educational attainment is associated with these (binary) technology-related outcomes, even after accounting for other factors such as age and geographic region.

To do so, we would do the same procedure as above, but could run logistic regression models instead of OLS models. The “margins” command after each model will ask Stata to calculate and save the predicted probability of having access to that technology at each value of education (starting at the lowest observed value (0) and increasing by increments of .1 up to the highest observed value (1)), adjusting for the respondent’s age and whether they reside in a rural or urban area.

# Note:  all continuous predictors have been rescaled to range from 0 to 1

# Model 1: Predicting whether respondent owns a mobile phone
logit OwnMobilePhone Rural Age_01 Education_01
margins, at(Education_01=(0 (.1) 1)) saving(model_phone, replace)

# Model 2: Predicting whether respondent has ever used a computer
logit EverUsedComputer Rural Age_01 Education_01
margins, at(Education_01=(0 (.1) 1)) saving(model_computer, replace)

# Model 3: Predicting whether respondent has ever used internet
logit EverUsedInternet Rural Age_01 Education_01
margins, at(Education_01=(0 (.1) 1)) saving(model_internet, replace)

Now that we have our saved results, we can create a combomarginsplot. The following code uses most of the same options as the examples above, but this time places the legend at the bottom of the graph and arrays it as one row rather than one column. (This allows more space for the plot itself, which may be useful in some cases.)

combomarginsplot model_phone  model_computer model_internet, /// lists the different models
plotopts(lwidth(medthick) lcolor(%70) msymbol(triangle) msize(large) mlcolor(yellow)) /// changes fit line and marker features
labels("Own Mobile Phone" "Ever Used Computer" "Ever Used Internet") /// specifies the labels we want in the legend
legend(ring(1) pos(6) row(1) margin(medsmall) size(small) title(Technology Outcomes, box color(white) fcolor(gs6) margin(zero) size(medsmall))) /// legend options
graphregion(margin(medsmall)) /// specifies amount of space between graph and outside perimeter
plotregion(margin(medlarge)) /// adds more space between graph and inside perimeter
recastci(rarea) ciopts(fcolor(%40)) /// fills in CI area around each line at 40% opacity
scheme(black_tableau) /// changes scheme
ylab(0(.1).7, glcolor(gs4) glpattern(solid)) /// makes solid, gray horizontal gridlines
xlab(, glcolor(gs4) glpattern(solid)) /// makes solid, gray vertical gridlines
xsize(6.5) ysize(4.5) /// makes width of graph 6.5 inches, and height of graph 4.5 inches
xtitle("Education Level") /// specifies x-axis title
ytitle(Predicted Probability) /// specifies y-axis title
title("Education & Access to Technology") /// makes title
subtitle("Pakistani Women, Punjab Province", span) /// makes subtitle
note("Note: Models control for rural/urban and age; n=~9000", size(vsmall) span) // adds note; "span" places in left corner

This produces the following graph:

Predicted probabilities of having access to three separate technologies. Fit lines include 95% confidence intervals.

Here we clearly see that the probability of having access to each of these technologies is predicted to be higher among those with more education. But we can also see several points that may have been less obvious had we produced three separate marginsplots instead of a single combomarginsplot. For example, the probability of owning a mobile phone is substantially higher than the probabilities of the other two outcomes. Also, the probability of having used a computer and internet rises slowly until about the mid-point of the Education scale, but rises sharply after the mid-point (and noticeably more quickly for computer use than internet use).

Conclusion

I hope this tutorial has been helpful, not only for seeing how to make a combomarginsplot, but also why and when we would want to do so. The combomarginsplot function helps to save space by placing multiple models within one graph and better facilates comparison between different outcomes and models.

It also just looks pretty cool, in my humble opinion.

I also hope that the code provided will go some way toward helping you produce publication- and presentation-quality visualizations for your analyses. (For a published example using combomarginsplot, see Figures 2–4 in my co-authored article here.)

Thanks for reading! Good luck and feel free to reach out anytime with comments, suggestions or questions!

Email: jvk221@nyu.edu

X/Twitter: @UptonOrwell

BlueSky Social: @UptonOrwell.bsky.social

John V. Kane is an Associate Professor at NYU’s Center for Global Affairs and holds a Ph.D. in Political Science. He researches political attitudes & experimental methods. More info at www.johnvkane.com

--

--

John V. Kane
The Stata Gallery

John V. Kane is an Associate Professor at NYU's Center for Global Affairs. He researches political attitudes & experimental methods. Twitter: @UptonOrwell