Making Regression Coefficient Plots in Stata

A concise guide for making publication-quality graphs of regression results using Stata. The guide covers the most common aesthetic changes, combining multiple models, working with binary outcome models, and more, with plenty of other little tips and tricks along the way.

John V. Kane
The Stata Gallery

--

INTRO TO COEFPLOT

Traditionally, researchers reported results of regression analyses using tables. A more visually appealing way of presenting these results is by using a coefficient plot. Unlike a typical scatterplot with a fitted line (or a “marginsplot”), a coefficient plot displays multiple coefficients from the model — or from several models — at once.

To do this, Stata users can install Ben Jann’s extremely popular “coefplot” package:

ssc install coefplot, replace

There is an impressively comprehensive online resource for coefplot, which I have used a ton. Here is a link to it.

The purpose of this guide is to provide researchers with a concise resource that contains some of the most common commands and options for producing publication-quality coefficient plots using Stata. It’s the product of many years of me banging my head against my desk with a computer screen full of error messages. The goal is simply to make nicer coefficient plots that (ideally) don’t require any additional editing via Stata’s Graph Editor (though that is perfectly fine as a last resort).

The .do file containing all the code below is located here.

Notes before we get started:

  1. Graphs will use schemes from “schemepack” by Asjad Naqvi. You can explore all your awesome new schemes by executing “graph query, schemes”:
ssc install schemepack, replace

graph query, schemes

2. Note: All graphs use “AbelPro-Regular” font, which can be downloaded here. For details on installing/using fonts that are not native to Stata, see here.

BASIC COMMANDS & ESSENTIAL OPTIONS FOR EVERY COEFPLOT

We’ll begin by importing everybody’s favorite dataset: auto.dta

sysuse auto.dta, clear

We’ll regress price onto mpg, weight, length, and foreign. To produce a basic coefficient plot, simply execute coefplot after the model:

reg price mpg weight length i.foreign

coefplot

If using a version of Stata that is pre-v.18, your graph will use the heartbreakingly dull “s2color” scheme, which is shown below. (Version 18 uses the new “stcolor” scheme. So as not to exclude pre-18 users, I will use “schemepack” schemes in the examples below.)

This graph shows point estimates for each variable, as well as the constant, with 95% confidence intervals.

That said, it’s probably not the prettiest graph you’ve ever seen.

Some essential first options for making every coefplot nicer and more informative:

1. Remove the constant using “drop(_cons)”: the constant is rarely useful and will often (though not in the example above) extend the x-axis by a large amount to accommodate the value of the constant, making the coefficients difficult to read.

2. Add a vertical line at x=0 using “xline(0)”: This helps readers see the coefficients (and their CIs) in relation to 0, which is central to null hypothesis significance testing. The “lcolor( )” option controls the color of the line; the “lwidth( )” option controls the line’s width.

3. Change the scheme using “scheme( )”: Please don’t use s2color. Please. I’m begging you.

Implementing these options, with “scheme(white_jet)” produces a graph with nicer colors, but in this case looks a bit weird because the effect of “foreign” is enormous compared to the effects of any of the other variables (usually this is what will happen when the constant is left in rather than removed — this just happens to be an unusual case, I promise!). As such, before making another graph, I am going to put all of the predictor variables on a 0 to 1 scale using a simple trick:

sum mpg // necessary for next line to work
gen mpg_01=(mpg-r(min)) / (r(max)-r(min)) // subtracts min and divides by range

sum weight
gen weight_01=(weight-r(min)) / (r(max)-r(min))

sum length
gen length_01=(length-r(min)) / (r(max)-r(min))

tab1 mpg_01 weight_01 length_01

Now that I have all the predictors on a 0-to-1 scale, I run the following:

reg price mpg_01 weight_01 length_01 i.foreign // model

coefplot, ///
drop(_cons) /// remove the constant from the graph
xline(0, lcolor(red) lwidth(medium)) /// place a medium-thick red vertical line at x=0
scheme(white_jet) // change the scheme

Which produces:

This looks nicer than the original in many ways. But it can still be customized and improved upon quite a bit, starting with the labels for the variables/values. (Note: In the model, using “i.” before “foreign” leads to the value label being used in the graph; had only “foreign” been specified in the model, the variable label would appear in the graph instead.)

CHANGING VARIABLE AND VALUE LABELS

The analyst (hopefully) knows what those variable names on the y-axis mean, but readers probably won’t. To have more informative labels, we use the “coeflabels( )” option. The syntax looks like the second line here:

coefplot, drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
coeflabels(mpg_01="MPG" weight_01="Weight" length_01="Length")

This produces the following graph:

One challenge for changing labels is when categorical variables are featured, especially in interactions. Specifically, we need to refer to the variable/value the same way that Stata refers to the variable. Imagine we specify an interaction between “foreign” and “mpg_01”. How can we figure out what Stata calls the interaction so that we can give it a nice label? Answer: Including the “coeflegend” option after the model (note: this is extremely useful to know for many additional options discussed below!):

reg price mpg_01 weight_01 length_01 i.foreign i.foreign#c.mpg_01, coeflegend

Thus, when we want to make labels for each coefficient, we would do the following:

coefplot, drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
coeflabels(mpg_01="MPG" weight_01="Weight" length_01="Length" 1.foreign="Foreign" ///
1.foreign#c.mpg_01="MPG X Foreign")

This produces the following graph:

CHANGING THE ORDER OF COEFFICIENTS

One way we might want to adjust a graph is in terms of the ordering of the variables. This can, of course, be done simply by changing the order of the variables in the model itself. However, sometimes we might need to change it in the graph. This can be done using the -order( )- option. Importantly, we again need to refer to the variables the same way that Stata does, so the “coeflegend” option shown in the previous section is again useful. In the following code, I am going to place the interaction immediately after the MPG and Foreign coefficients, and then have Length, then Weight. Note: the variable listing within the -order( )- option does not need to be the same as the variable listing in the -coeflabels( )- option.

Also, I am adding a few additional options to improve informativeness/aesthetics:

1. Title the x-axis using -title(“ “)- plus the {bf: } syntax to bold the text

2. Reduce the outside margins of the graph using “graphregion(margin( ))”

3. Using -xsize( )- and -ysize( )- make the dimensions 6.5 inches wide by 4.5 inches tall, which works well in Word documents (with 1-inch margins, the graph will span the entire width)

4. Use a nice little trick to make the value label for the interaction term appear on two lines instead of one: `””top part” “bottom part””’ (see the very last line of code below).

Here’s the code:

reg price mpg_01 weight_01 length_01 i.foreign i.foreign#c.mpg_01, coeflegend

coefplot, drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
xtitle("{bf: Effect on Vehicle Price}") /// bolded text
graphregion(margin(medsmall)) ///
xsize(6.5) ysize(4.5) ///
order(mpg_01 1.foreign 1.foreign#c.mpg_01 length_01 weight_01) ///
coeflabels(mpg_01="MPG" weight_01="Weight" length_01="Length" 1.foreign="Foreign" ///
1.foreign#c.mpg_01=`""MPG X" "Foreign""')

This produces the following graph:

DROPPING/KEEPING CERTAIN VARIABLES

Perhaps we don’t want every single variable from the model featured in the coefplot. Just as we did in the first section, we can drop any variables using the -drop( )- option. If there are many variables to drop, it might be easier to instead keep a specific list of variables using -keep( )-. Either way, we again need to know how Stata refers to these variables in order to keep/drop them (“, coeflegend” is your friend!).

Here, I demonstrate how to use the “keep” option to keep only Weight and Length:

reg price mpg_01 weight_01 length_01 i.foreign, coeflegend

coefplot, drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
xtitle("{bf: Effect on Vehicle Price}") ///
graphregion(margin(medsmall)) ///
xsize(6.5) ysize(4.5) ///
keep(length_01 weight_01) /// Keep option
coeflabels(mpg_01="MPG" weight_01="Weight" length_01="Length" 1.foreign="Foreign" ///
1.foreign#c.mpg_01=`""MPG X" "Foreign""') ///
note("Note: Model controls for MPG and whether vehicle is foreign or domestic", span size(vsmall))
*This model also includes a note at the bottom

This produces the following graph:

ADDING HEADINGS TO DISTINGUISH GROUPS OF VARIABLES

If our model has many predictors, it can be useful to distinguish them in the coefficient plot. For example, we may want our key predictor variable(s) to be placed in one group, while control variables appear in a different group. We can accomplish this with the -headings( )- option. The trick here is that we specify which variable (using the name from the “coeflegend” option above) we want each heading to come before. (Note: The -gap( )- option allows us to place more or less of a gap before a heading.) See section of code below.

I’m also adding a few more options to make the graph a bit nicer:

1. Using -xlab( , glpattern( ) gsline( ))- I am adding solid, light gray vertical lines to replace the dotted lines

2. Using -grid( glpattern( ) gsline( ))- I am adding solid, light gray horizontal lines to replace the dotted lines

3. Using -ylab(, labsize( ))- I am enlarging the size of the y-axis labels by .1

The code is thus:

reg price mpg_01 weight_01 length_01 i.foreign i.foreign#c.mpg_01, coeflegend

coefplot, drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
xtitle("{bf: Effect on Vehicle Price}") /// bolded text
graphregion(margin(medsmall)) ///
xsize(6.5) ysize(4.5) ///
xlab(, glpattern(solid) glcolor(gs14)) /// adds solid, light gray vertical lines at x-axis values
grid(glpattern(solid) glcolor(gs14)) /// adds solid, light gray horizontal lines at y-axis values
ylab(, labsize(*1.1)) /// enlarge size of y-axis labels
coeflabels(mpg_01="MPG" length_01="Length" weight_01="Weight" 1.foreign="Foreign" ///
1.foreign#c.mpg_01=`""MPG X" "Foreign""') ///
headings(mpg_01= "{bf: Vehicle Specs}" /// creates headings
1.foreign = "{bf: Production Info}" 1.foreign#c.mpg_01 = "{bf: Interactions}", gap(0)) //

This produces the following graph:

MAKING NICER MARKERS

One less-than-great feature of our coefplot so far has been the markers (that is, the dots representing the coefficient estimates). Just as in scatterplots, these markers can be altered in terms of:

1. The size of the marker: Use -msize( )-

2. The color and opacity: Use -mcolor( )- with “%#” to indicate opacity

3. The color of the marker’s outer line: Use -mlcolor( )-

4. The width of the marker’s outer line: Use -mlwidth( )-

5. The marker’s symbol: Use -msymbol( )-

a. TIP: It’s impossible (for me, at least) to remember how Stata refers to all different sizes, colors, symbols, etc. When you can’t remember, just use the drop-down menu to make a simple scatterplot and choose all the marker options you want. The code in the output window will tell you the proper names for the various sizes, colors, symbols, etc. that you want, which you can then copy and paste into your coefplot code.

More info about marker options can be found here.

Here is example code we can use (marker options on last line):

reg price mpg_01 weight_01 length_01 i.foreign i.foreign#c.mpg_01, coeflegend

*Basic changes to markers
coefplot, drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
xtitle("{bf: Effect on Vehicle Price}") ///
graphregion(margin(medsmall)) ///
xsize(6.5) ysize(4.5) ///
xlab(, glpattern(solid) glcolor(gs14)) ///
grid(glpattern(solid) glcolor(gs14)) ///
ylab(, labsize(*1.1)) ///
coeflabels(mpg_01="MPG" length_01="Length" weight_01="Weight" 1.foreign="Foreign" ///
1.foreign#c.mpg_01=`""MPG X" "Foreign""') ///
headings(mpg_01= "{bf: Vehicle Specs}" ///
1.foreign = "{bf: Production Info}" 1.foreign#c.mpg_01 = "{bf: Interactions}", gap(0)) ///
msize(large) mcolor(%85) mlcolor(cyan) mlwidth(medium) msymbol(circle) // marker options

This produces the following graph:

A neat alternative option for markers is to place the point-estimate value within the marker itself. This can be done by making the marker much larger and then using the marker label option -mlabel- along with options to adjust the marker labels’ color -mlabcolor( )- size -mlabsize( )- and formatting -format( )-. The code below exemplifies how we can do this (note that the scheme and xline features have changed since the previous graph):

 *Placing coefficient values inside the marker
reg price mpg_01 weight_01 length_01 i.foreign i.foreign#c.mpg_01

coefplot, drop(_cons) xline(0, lcolor(black) lwidth(medium)) scheme(gg_tableau) ///
xtitle("{bf: Effect on Vehicle Price}") ///
graphregion(margin(medium)) ///
xsize(6.5) ysize(4.5) ///
xlab(, glpattern(solid) glcolor(white)) ///
grid(glpattern(solid) glcolor(white)) ///
ylab(, labsize(*1.1)) /// enlarge size of y-axis labels
coeflabels(mpg_01="MPG" length_01="Length" weight_01="Weight" 1.foreign="Foreign" ///
1.foreign#c.mpg_01=`""MPG X" "Foreign""') ///
headings(mpg_01= "{bf: Vehicle Specs}" ///
1.foreign = "{bf: Production Info}" 1.foreign#c.mpg_01 = "{bf: Interactions}", gap(0)) ///
mlabel mlabcolor(white) mlabsize(vsmall) format(%4.0f) ///
mlabposition(center) msize(ehuge) mcolor(black) mlcolor(gs10) //

/*Important: "format(%4.0)" indicates that Stata will limit the label to 4 digits before the decimal & 0 after.
But for your project, you may want it to be rounded to two decimals, for example. This would be: "format(%4.2f)"
*/

This produces the following graph:

MAKING NICER CONFIDENCE INTERVALS

Once we change markers, we often will want to also change the look of the confidence intervals. Most of the changes take place within the -ciopts( )- option. Simple changes include changing the color and width of the line, as well as the style (e.g., a “cap” style instead of the default “spike” style) -ciopts(lcolor( ) lwidth( ) recast( ))-. For example, the following code features such changes in the last line:

reg price mpg_01 weight_01 length_01 i.foreign i.foreign#c.mpg_01

*Simple changes to CI line color, width, and style
coefplot, drop(_cons) xline(0, lcolor(black) lwidth(medium)) scheme(gg_tableau) ///
xtitle("{bf: Effect on Vehicle Price}") ///
graphregion(margin(medium)) ///
xsize(6.5) ysize(4.5) ///
xlab(, glpattern(solid) glcolor(white)) ///
grid(glpattern(solid) glcolor(white)) ///
ylab(, labsize(*1.1)) ///
coeflabels(mpg_01="MPG" length_01="Length" weight_01="Weight" 1.foreign="Foreign" ///
1.foreign#c.mpg_01=`""MPG X" "Foreign""') ///
headings(mpg_01= "{bf: Vehicle Specs}" ///
1.foreign = "{bf: Production Info}" 1.foreign#c.mpg_01 = "{bf: Interactions}", gap(0)) ///
mlabel mlabcolor(white) mlabsize(vsmall) format(%4.0f) ///
mlabposition(center) msize(ehuge) mcolor(black) mlcolor(gs10) ///
ciopts(lcolor(black) lwidth(medthick) recast(rcap)) // ci options are here

This produces the following graph:

Another common feature is to include two levels of confidence intervals (CIs) — say, 90% and 95%. This requires using -ciopts( )- as well as the -levels( )- option. Crucially, in this example, each option will have two features specified: the first feature should be for the 95% CI, the second feature for the 90% CI. (If three levels were specified, then three features would be specified for each option, and so on.) For example, the following code features two levels of CIs in the last line:

reg price mpg_01 weight_01 length_01 i.foreign i.foreign#c.mpg_01

coefplot, drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
xtitle("{bf: Effect on Vehicle Price}") /// bolded text
graphregion(margin(medsmall)) ///
xsize(6.5) ysize(4.5) ///
xlab(, glpattern(solid) glcolor(gs14)) /// adds solid, light gray vertical lines at x-axis values
grid(glpattern(solid) glcolor(gs14)) /// adds solid, light gray horizontal lines at y-axis values
ylab(, labsize(*1.1)) /// enlarge size of y-axis labels
coeflabels(mpg_01="MPG" length_01="Length" weight_01="Weight" 1.foreign="Foreign" ///
1.foreign#c.mpg_01=`""MPG X" "Foreign""') ///
headings(mpg_01= "{bf: Vehicle Specs}" ///
1.foreign = "{bf: Production Info}" 1.foreign#c.mpg_01 = "{bf: Interactions}", gap(0)) ///
msize(large) mcolor(%85) mlcolor(cyan) mlwidth(medium) msymbol(circle) ///
levels(95 90) ciopts(lcolor(magenta midblue) lwidth(medthick thick) recast(rspike rcap)) //This produces the following graph:

This produces the following graph:

Another nice way of rendering the CIs is to use the -cismooth( )- option. This option produces a gradient of color to represent the CI rather than a single color. Importantly, this option extends to the 99th percentile, regardless of what is specified by -levels( )-. More information about “cismooth” sub-options can be found here. The code below provides one example using black, smoothed CIs:

 *Featuring "smoothed" CIs
reg price mpg_01 weight_01 length_01 i.foreign i.foreign#c.mpg_01

*Example 1
coefplot, drop(_cons) xline(0, lcolor(black) lwidth(medium)) scheme(white_tableau) ///
xtitle("{bf: Effect on Vehicle Price}") /// bolded text
graphregion(margin(medium)) ///
xsize(6.5) ysize(4.5) ///
grid(none) /// uses only vertical grid lines
ylab(, labsize(*1.1)) /// enlarge size of y-axis labels
coeflabels(mpg_01="MPG" length_01="Length" weight_01="Weight" 1.foreign="Foreign" ///
1.foreign#c.mpg_01=`""MPG X" "Foreign""') ///
headings(mpg_01= "{bf: Vehicle Specs}" ///
1.foreign = "{bf: Production Info}" 1.foreign#c.mpg_01 = "{bf: Interactions}", gap(0)) ///
mlabel mlabcolor(white) mlabsize(vsmall) format(%4.0f) ///
mlabposition(center) msize(ehuge) mcolor(black) mlcolor(gs10) ///
cismooth(color(black) lwidth(4 55)) // default lwidth( ) range is 2 to 15

This produces the following graph:

And the following is code is an example of using -cismooth( )- in the graph with the blue markers (notice, because the markers are smaller, a smaller -lwidth( )- range is specified):

  *Example 2
coefplot, drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
xtitle("{bf: Effect on Vehicle Price}") /// bolded text
graphregion(margin(medsmall)) ///
xsize(6.5) ysize(4.5) ///
xlab(, glpattern(solid) glcolor(gs14)) /// adds solid, light gray vertical lines at x-axis values
grid(glpattern(solid) glcolor(gs14)) /// adds solid, light gray horizontal lines at y-axis values
ylab(, labsize(*1.1)) /// enlarge size of y-axis labels
coeflabels(mpg_01="MPG" length_01="Length" weight_01="Weight" 1.foreign="Foreign" ///
1.foreign#c.mpg_01=`""MPG X" "Foreign""') ///
headings(mpg_01= "{bf: Vehicle Specs}" ///
1.foreign = "{bf: Production Info}" 1.foreign#c.mpg_01 = "{bf: Interactions}", gap(0)) ///
msize(large) mcolor(%85) mlcolor(cyan) mlwidth(medium) msymbol(circle) ///
cismooth(color(midblue) lwidth(2 30)) // default lwidth( ) range is 2 to 15

This produces the following graph:

MAKING A COEFPLOT FOR MULTIPLE MODELS

The coefplot package can handle showing the results of multiple models at once. This can useful in numerous situations. For example, when you want to run the same model on different subgroups; when you want to show the results of models with slightly different predictor variables; or, when you want to show how one model compares to another model with some specific feature (e.g., survey weights, fixed effects, etc.). There are two ways of presenting these different models, depending on what you find most useful / aesthetically appealing:

Option #1: Display the results of multiple models within one panel

Option #2: Use multiple panels, each displaying one model’s result

For either option, the first step is to run each model and then store the results using -estimates store-. Let’s regress vehicles’ price onto their mpg, weight and length (again, all predictors have been rescaled to range between 0 and 1). In contrast to previous examples, we will now run this model separately for domestic vehicles and foreign vehicles.

The code to do this is as follows:

reg price mpg_01 weight_01 length_01 if foreign==0

estimates store mod_domestic // stores previous model's estimates

reg price mpg_01 weight_01 length_01 if foreign==1

estimates store mod_foreign // stores previous model's estimates

Now we can start making graph Option #1 from above. The first key thing to note is that the presentation of each group’s results can be customized in terms of what the group is called, how its markers look, and how its confidence intervals look. This occurs by placing the stored estimate name within parentheses, adding a comma, and then specifying all the options you want that are specific to that group. In the example below, notice that for “mod_domestic”, for example, the options specify the label, marker color, marker line color, and confidence interval line color. The same occurs for “mod_foreign”. *Note: we are going to again have two levels of CIs (95% and 90%) — this is why there are two colors featured for the CIs (the first is for the 95% line, the second for the 90% line), though you can keep them the same color if you wish. Any marker or CI options that apply to both levels can be featured as an option for the graph as a whole (see code below).

The second key thing to note is that this graph will produce a legend. Thus, you want to specify legend options (see section at bottom; the notes at the bottom explain these options).


*Option #1: All models in one panel
coefplot (mod_domestic, label("{bf: Domestic Cars}") mcolor(midblue) mlcolor(cyan) /// options for 1st group
ciopts(lcolor(magenta midblue))) /// ci options specific to 1st group
(mod_foreign, label("{bf: Foreign Cars}") mcolor(green) mlcolor(lime) /// options for 2nd group
ciopts(lcolor(lime green))), /// ci options specific to 2nd group
drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
xtitle("{bf: Effect on Vehicle Price}") ///
graphregion(margin(medsmall)) ///
xsize(6.5) ysize(4.5) ///
xlab(, glpattern(solid) glcolor(gs14)) ///
grid(glpattern(solid) glcolor(gs14)) ///
ylab(, labsize(*1.1)) ///
coeflabels(mpg_01="MPG" length_01="Length" weight_01="Weight") ///
msize(large) mcolor(%85) mlwidth(medium) msymbol(circle) /// marker options for all groups
levels(95 90) ciopts(lwidth(medthick thick) recast(rspike rcap)) /// ci options for all groups
legend(ring(1) col(1) pos(3) size(medsmall)) //
*legend notes: ring=0=inside plot; ring=1=outside plot; col=# of columns;pos=clock position

This produces the following graph:

The second presentation option is to have separate panels: one for each model. In our case, this would produce two panels (because we have two models). Because there are separate panels, we don’t need to specify different colors for the markers, nor do we need to create a legend. Though the code is therefore a bit simpler, one important trick to be aware of is that creating titles for each panel requires use of the -bylabel( )- option after each model name and also the double-vertical-line “||” operator to join the plots together. The code below illustrates how this is being done (see section at the top).

Two other useful things to note: If you want to add a graph note, you must use the -byopts(note( ))-option/sub-option (see code below). Second, if you want to change how the titles look at the top of each panel, use the -subtitle( )- option. The last line of code below demonstrates how to change the title text color, the box fill color, and the color of the line surrounding the box.

*Option 2:  One model in each panel
coefplot mod_domestic, bylabel("{bf: Domestic Cars}") ///
|| /// joins other model
mod_foreign, bylabel("{bf: Foreign Cars}") /// Notice: no comma at the end to separate this plot from global options
drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
xtitle("{bf: Effect on Vehicle Price}") ///
graphregion(margin(medsmall)) ///
xsize(6.5) ysize(4.5) ///
xlab(, glpattern(solid) glcolor(gs14)) ///
grid(glpattern(solid) glcolor(gs14)) ///
ylab(, labsize(*1.1)) ///
coeflabels(mpg_01="MPG" length_01="Length" weight_01="Weight") ///
msize(large) mcolor(%85) mlcolor(cyan) mlwidth(medium) msymbol(circle) /// applies to all markers
levels(95 90) ciopts(lcolor(magenta midblue) lwidth(medthick thick) recast(rspike rcap)) /// applies to all CIs
byopts(note("Note: Stata's Mesmerizing Auto Dataset", size(vsmall) span)) ///
subtitle(, color(black) fcolor(gs15) lcolor(gs12)) //
*subtitle options: text color, box fill color, and box perimeter line color

This produces the following graph:

USING COEFPLOT FOR MULTIPLE OUTCOME MEASURES

Sometimes we might have models that use the same set of predictors (or at least mostly the same) but have different outcome variables. We could of course create separate coefplots for each of these models. However, we can also combine the coefplots into one single graph using -graph combine( )-. We’ll need to take the following steps:

1. Set the working directory: We have to tell Stata where we are going to place the graphs we create (and where to pull them from for our final graph). Simply go to File > Change working directory…> navigate to wherever you want the files to go > click Choose.

2. Saving graphs: This can be done several ways, but the easiest is to include the -saving( )- option within your graph code. This appears in the final line of the codes below. Having completed Step 1, the graphs will save to your working directory.

3. Combine the graphs: This requires use of the -graph combine( )- function, which comes with many options. In the code below I show several of the most useful options.

Once we’ve set our working directory, we can run the models and save the graphs. Notice in the following code, two different models are being run, each with the “saving( )” option appearing at the very end. I’ve labeled the models as “mod1” and “mod2”, respectively. The graph options are otherwise the same as examples featured above.

*Example #1:  Style without marker labels
*First model
reg price length_01 weight_01 i.foreign

coefplot, drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
xtitle("{bf: Effect on Vehicle Price}") /// bolded text
graphregion(margin(medsmall)) ///
xsize(6.5) ysize(4.5) ///
xlab(, glpattern(solid) glcolor(gs14)) ///
grid(glpattern(solid) glcolor(gs14)) ///
ylab(, labsize(*1.1)) ///
coeflabels(length_01="Length" weight_01="Weight" 1.foreign="Foreign") ///
headings(length_01= "{bf: Vehicle Specs}" ///
1.foreign = "{bf: Production Info}", gap(0)) ///
msize(large) mcolor(%85) mlcolor(cyan) mlwidth(medium) msymbol(circle) ///
levels(95 90) ciopts(lcolor(magenta midblue) lwidth(medthick thick) recast(rspike rcap)) ///
subtitle("{bf: Price of Vehicle}", color(white) box fcolor(black)) ///
saving(mod1.gph, replace)
*Second model
reg mpg length_01 weight_01 i.foreign

coefplot, drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
xtitle("{bf: Effect on Vehicle MPG}") /// bolded text
graphregion(margin(medsmall)) ///
xsize(6.5) ysize(4.5) ///
xlab(, glpattern(solid) glcolor(gs14)) ///
grid(glpattern(solid) glcolor(gs14)) ///
ylab(, labsize(*1.1)) ///
coeflabels(length_01="Length" weight_01="Weight" 1.foreign="Foreign") ///
headings(length_01= "{bf: Vehicle Specs}" ///
1.foreign = "{bf: Production Info}", gap(0)) ///
msize(large) mcolor(%85) mlcolor(cyan) mlwidth(medium) msymbol(circle) ///
levels(95 90) ciopts(lcolor(magenta midblue) lwidth(medthick thick) recast(rspike rcap)) ///
subtitle("{bf: Miles Per Gallon}", color(white) box fcolor(black)) ///
saving(mod2.gph, replace)

We can now combine “mod1.gph” and “mod2.gph” using -graph combine-. I show some of the key options in the code below and provided explanations as to what they mean. Note: if your outcome measures are on similar scales (usually ideal to do, even though I’m not doing it here), you could also include the “xcommon” option, which will make the x-axes of each graph have the same scale. (In my example below, this isn’t a good option to include as “price” and “mpg” are on entirely different scales.)

graph combine "mod1.gph" "mod2.gph", scheme(white_jet) graphregion(margin(tiny)) xsize(6.5) ysize(4.5) ///
ycommon /// make the y-axes of constituent graphs identical (xcommon does this for x-axes)
col(1) /// array the panels in 1 column (that is, on top of one another)
imargin(medium) /// make the margins between the panels size medium
iscale(.9) /// reduce the text size slightly (can often be too large when combining graphs)
note("Note: 90% and 95% CIs shown", size(vsmall) span) // add a note

This produces the following graph:

And here is a second example, with code showing the models, individual coefplots, and process for combining the coefplots:

*Example #2:  Style with marker labels
*First model:
reg price length_01 weight_01 i.foreign

coefplot, drop(_cons) xline(0, lcolor(black) lwidth(medium)) scheme(gg_jet) ///
xtitle("{bf: Effect on Vehicle Price}") /// bolded text
graphregion(margin(medium)) ///
xsize(6.5) ysize(4.5) ///
grid(none) ///
ylab(, labsize(*1.1)) /// enlarge size of y-axis labels
coeflabels(length_01="Length" weight_01="Weight" 1.foreign="Foreign") ///
headings(length_01= "{bf: Vehicle Specs}" ///
1.foreign = "{bf: Production Info}", gap(0)) ///
mlabel mlabcolor(black) mlabsize(vsmall) format(%4.0f) ///
mlabposition(center) msize(ehuge) mcolor(white) mlcolor(gs10) ///
cismooth(color(black) lwidth(2 20)) /// default lwidth( ) range is 2 to 15
subtitle("Price of Vehicle", size(small) color(white) box fcolor(black) position(3) ///
orientation(rvertical) margin(vsmall)) ///
saving(mod1_labels.gph, replace)
*Second model
reg mpg length_01 weight_01 i.foreign

coefplot, drop(_cons) xline(0, lcolor(black) lwidth(medium)) scheme(gg_jet) ///
xtitle("{bf: Effect on Vehicle MPG}") /// bolded text
graphregion(margin(medium)) ///
xsize(6.5) ysize(4.5) ///
grid(none) /// adds solid, light gray horizontal lines at y-axis values
ylab(, labsize(*1.1)) /// enlarge size of y-axis labels
coeflabels(length_01="Length" weight_01="Weight" 1.foreign="Foreign") ///
headings(length_01= "{bf: Vehicle Specs}" ///
1.foreign = "{bf: Production Info}", gap(0)) ///
mlabel mlabcolor(black) mlabsize(vsmall) format(%4.0f) ///
mlabposition(center) msize(ehuge) mcolor(white) mlcolor(gs10) ///
cismooth(color(black) lwidth(2 20)) /// default lwidth( ) range is 2 to 15
subtitle("Miles Per Gallon", size(small) color(white) box fcolor(black) position(3) ///
orientation(rvertical) margin(vsmall)) ///
saving(mod2_labels.gph, replace)

graph combine "mod1_labels.gph" "mod2_labels.gph", scheme(gg_jet) graphregion(margin(tiny)) xsize(6.5) ysize(4.5) ///
ycommon /// make the y-axes of constituent graphs identical (xcommon does this for x-axes)
col(1) /// array the panels in 1 column (that is, on top of one another)
imargin(medium) /// make the margins between the panels size medium
iscale(1.05) /// reduce the text size slightly (can often be too large when combining graphs)
note("Note: Smoothed CIs shown", size(vsmall) span) // add a note

In contrast to the previous example, this example uses marker labels and smoothed confidence intervals (discussed above; note that you may need to play around with the smoothed CI range to get it looking nicely). This code also places graph titles on the right y-axes within the “subtitle( )” option. Note: you may need to play around with the sizes of the markers and labels to get it looking correctly in the final combined graph. Here’s the graph:

DISPLAYING CHANGES IN PR(Y=1) FROM LOGIT/PROBIT MODELS

If you run a logit or probit model and then try to produce a coefplot, you will notice that the plot displays logistic/probit coefficients. However, these are not easily interpretable. We might instead want to display coefficients as effects on the probability of the outcome measure (Y) being equal to 1. The key trick to doing this is to use the -margins( )- command right after the model and specify the “dydx(*)” and “post” options. Once we do that, coefplot will display each variable’s effect on the probability that Y is equal to 1. Other types of results (e.g., odds ratios) are also possible to do (see here).

The example code below shows how to do this for a logit model, but specifying a probit model works the same way (just insert “probit” in place of “logit”). The model uses the lbw.dta data set, which can be by running:

webuse lbw.dta, clear

The model regresses whether or not a child was born underweight (1=yes) onto whether or not the mother smoked during pregnancy (1=yes), number of doctor visits during her first trimester, count of premature births, age, and a simple (trichotomous) racial identification variable. The doctor visit, premature births, and age variables were recoded to range from 0 to 1 (as in the examples above):

*Rescale continuous predictor variables to range from 0 to 1
sum age
gen age_01=(age-r(min))/(r(max)-r(min))
tab age_01

sum ftv
gen ftv_01=(ftv-r(min))/(r(max)-r(min))

sum ptl
gen ptl_01=(ptl-r(min))/(r(max)-r(min))

*Model
logit low smoke ftv_01 ptl_01 c.age_01 i.race, coeflegend // see what Stata calls each variable

logit low smoke ftv_01 ptl_01 c.age_01 i.race
margins, dydx(*) post // determines marginal effect of each variable; note that "post" is necessary

The coefplot code is as follows (note that it also makes a number of small stylistic changes):

coefplot, drop(_cons) xline(0, lcolor(red) lwidth(medium)) scheme(white_jet) ///
xtitle("{bf: Effect on Pr(Low-Birth Weight)}") ///
graphregion(margin(medsmall)) ///
xsize(6.5) ysize(4.5) ///
xlab(, glpattern(solid) glcolor(gs14)) /// adds solid, light gray vertical lines at x-axis values
grid(glpattern(solid) glcolor(gs14)) /// adds solid, light gray horizontal lines at y-axis values
ylab(, labsize(*1.1)) /// enlarge size of y-axis labels
baselevels ///
coeflabels(age_01=`""Age of" "Mother""' smoke="Smoker" ftv_01=`""Doctor" "Visits""' ///
ptl_01=`""Premature" "Birth Ct.""' 1.race="White" 2.race="Black" 3.race=`""Other" "Race ID""') ///
headings(age_01="{bf: Demographics}", gap(0)) ///
msize(large) mlcolor(navy) mlwidth(thick) msymbol(square_hollow) ///
levels(95 90) ciopts(lcolor(navy midblue) lwidth(medthick thick) recast(rspike rcap)) ///
subtitle(, color(black) fcolor(gs15) lcolor(gs12)) ///
note("Note: All predictors 0 to 1. White= excluded race category", size(vsmall) span) // add a note

This produces the following graph:

More info about displaying logit/probit results with coefplot can be found here.

BONUS: DISPLAYING COEFFICIENTS AS BARS INSTEAD OF POINTS

Finally, one way of making the coefplot a bit more unique is to feature bars instead of points. The height of the bar is equal to the value of the point estimate. In fairness, the bar is not communicating anything that the regular marker couldn’t, but it probably does look a little more interesting…maybe?

The way to do this is to simply use the -recast(bar)- option, followed by any options for coloring/outlining the bars. The code below provides an example using the same model as above, but with bars instead of markers:

logit low smoke ftv_01 ptl_01 c.age_01 i.race
margins, dydx(*) post

coefplot, drop(_cons) xline(0, lcolor(navy) lwidth(medium)) scheme(white_jet) ///
xtitle("{bf: Effect on Pr(Low-Birth Weight)}") ///
graphregion(margin(medsmall)) ///
xsize(6.5) ysize(4.5) ///
xlab(, glpattern(solid) glcolor(gs14)) /// adds solid, light gray vertical lines at x-axis values
grid(none) /// adds solid, light gray horizontal lines at y-axis values
ylab(, labsize(*1.1)) /// enlarge size of y-axis labels
baselevels ///
coeflabels(age_01=`""Age of" "Mother""' smoke="Smoker" ftv_01=`""Doctor" "Visits""' ///
ptl_01=`""Premature" "Birth Ct.""' 1.race="White" 2.race="Black" 3.race=`""Other" "Race ID""') ///
recast(bar) barwidth(1.0) bfcolor(magenta%85) bfintensity(100) blcolor(purple) ///
headings(age_01="{bf: Demographics}", gap(0)) ///
levels(95) ciopts(lcolor(midblue%80) lwidth(medthick) recast(rcap)) citop ///
subtitle(, color(black) fcolor(gs15) lcolor(gs12)) ///
note("Note: All predictors 0 to 1. White= excluded race category", size(vsmall) span) // add a note

Last graph!!!:

I sincerely hope this guide has been helpful for you! Let me know what you think!

About the Author

John V. Kane is Clinical Associate Professor at the Center for Global Affairs and an Affiliated Faculty member of NYU’s Department of Politics. He received his Ph.D. in political science and his primary research interests include public opinion, political psychology, and experimental research methodology. His research has been published in a variety of top-ranking peer-reviewed journals, including the American Political Science Review, American Journal of Political Science, the Journal of Politics, and the Journal of Experimental Political Science. His research has been featured in numerous media outlets, including The New York Times, the Washington Post, and National Public Radio. He has taught graduate courses on political psychology, research methods, statistics and data analysis, and has also received teaching excellence awards from both New York University and Stony Brook University. His website is www.johnvkane.com. Follow him on X/Twitter, ResearchGate, LinkedIn, and/or BlueSky Social.

--

--

John V. Kane
The Stata Gallery

John V. Kane is an Associate Professor at NYU's Center for Global Affairs. He researches political attitudes & experimental methods. Twitter: @UptonOrwell