Advanced Bar Graphs in Stata (Part 1): Means with Confidence Intervals

This guide covers how to make bar graphs of means and confidence intervals, using Stata software. It focuses on two user-written packages, -cibar- and -coefplot-, and includes all code to replicate each graph. A variety of applications are covered, while also offering plenty of tips and tricks for making your Stata graph as clear and informative as possible.

John V. Kane
The Stata Gallery

--

Bar graphs probably don’t sound like the most exciting type of data visualization, but there’s a reason they’re still around after all these years: they are easy to read (even for non-quant folks) and can quickly communicate a great deal of important information.

One common form of a bar graph is one that displays the mean of some variable over values of one (or more) other variables (e.g., demographic variables, experimental conditions, etc.). Another common form displays means of multiple variables at once.

Given that researchers often make these graphs with sample data, it is common to include confidence intervals (CIs) on each bar (e.g., a vertical line, on top of the bar, to indicate the 95% confidence interval). Stata, for some reason, does not have a built-in option to add CIs to bar graphs. What to do?

Fortunately there are two user-written packages that can help us. One is -cibar- and the other is -coefplot-. The purpose of this guide is to show you how to use each one effectively in a variety of scenarios.

Overview of cibar and coefplot

Why two different packages? The short answer is as follows:

cibar: simple and fast when you have one variable for which you’d show the mean and at least one “over” variable all ready to go (more on this below). For most applications, this package should do the trick.

coefplot: a little more complicated, but also more flexible — it can work with means from multiple variables at once, asymmetric confidence intervals (e.g., logistic CIs), and also allows for more customization in terms of options (orientation, labels, the look of the CIs, etc.).

To install both, execute the following:

# cibar
net install cibar, from("http://fmwww.bc.edu/RePEc/bocode/c") replace

# coefplot
ssc install coefplot, replace

A couple other things before we get started:

  1. Graphs will mostly use schemes from “schemepack” by Asjad Naqvi. To install, simply execute:
ssc install schemepack, replace

You can then explore all your awesome new schemes by executing the following:

graph query, schemes

2. Note: All graphs use “AbelPro-Regular” font, which can be downloaded here. For details on installing/using fonts that are not native to Stata, see here.

Using the -cibar- Package

Again, for the most common applications, cibar should be able to do what you need. The two fundamental things you need are:

  1. The “mean” variable: this is the continuous variable for which you want to display the mean(s). Let’s refer to it as “Y”.
  2. The “over” variable: hopefully you do not want to make a bar graph that shows a single mean. Rather, you probably want to show the mean of Y for each category of some other variable. This variable, which we can call “X1", is typically categorical in nature, such that each value of X1 can have its own mean of Y.

Once you have these two variables in your data set, you can start using cibar! Let’s work on an example.

We’ll begin with the “lbw.dta” dataset, which features data on mothers and their newborn children. You can load it using the following code:

    
webuse lbw, clear

Let’s say we want to examine the average age of mothers over the three racial identification categories featured in the dataset. Our Y variable is therefore “age” while our “over”, or X1, variable is “race”.

A bare-bones cibar graph can be produced with just a single line of code:

cibar age, over(race) 

What we get (especially if you are working with Stata 18’s default “stcolor” scheme) really isn’t half bad, especially given how simple that code was.

A bare-bones example of a cibar graph (scheme is Stata 18’s new default “stcolor”)

Naturally, there’s a lot more we can add in terms of options to improve our graph, which brings us to an important detail about -cibar-: you need to be mindful of what the options are called (some are specific to -cibar-) and, especially, that many of the most common graph options must take place within -graphopts( )-. (You can always execute: help cibar if you run into trouble.)

Let’s look at an example of many more options added to the above graph, including a change in scheme.

cibar age, over(race) /// syntax:  cibar Y, over(X1)
level(95) /// specifies the CI level to be 95% (this is the default)
ciopts(lcolor(purple) lwidth(medium)) /// CI line color and width
barcolor(%80 %80) /// makes 1st and 2nd bar 80% and 80% opacity, respectively
baropts(lcolor(magenta) lwidth(medthick)) /// make bar outline color magenta; medium thick line width
bargap(50) /// places a larger gap between the bars (otherwise there will be no gap)
graphopts(ytitle(Average Age of Mother, col(purple)) ylabel(18(1)28) /// here is where "graphopts( )" begins
xtitle("Race of Mother", col(purple)) /// title x-axis
note("Note: 95% CIs Shown", span size(vsmall)) /// add note at the bottom
scheme(swift_red) /// change scheme
legend(pos(3) ring(1) col(1) size(medium)) /// legend options
xsize(6.5) ysize(4.5) graphregion(margin(vsmall)) /// graph dimensions (inches) and margin b/w plot and edge
) // Note this last parenthesis!

A few things to note:

— options for the CIs go within “ciopts()”

— colors for the bars go within “barcolor( )”

— other options for the bars go within “baropts( )”

— everything else in terms of graph options goes within “graphopts( )”. Note that closing parenthesis at the very end of the code! This closes the “graphopts( )” option.

This gives us the following graph:

cibar with “swift_red” scheme

There is another option to add labels to the bars. (Personally, I don’t find this all that helpful, but sometimes it might be useful.) We can also change the CIs to be a “spike” rather than having a “cap” at the top and bottom.

Also, if the goal is to compare the means to see if there is a statistically significant difference between each bar (very common reason for bar graphs), it is worth emphasizing this paper by Bolsen (see pp. 12–16), which finds that checking for “overlapping 95% CIs” is too conservative and instead recommends using 83% CIs.

Let’s do an example featuring all of these additional options. We can use the age variable and a condensed version of a variable measuring number of doctor visits during the mother’s first trimester. I’ll first create the latter variable like this (I previously made a guide on data cleaning in Stata, which can be found here):

recode ftv (0=0 "None") (1=1 "One") (2=2 "Two") (3 4 5 6=3 "Three or More"), gen(ftv_condensed)

Now here’s the -cibar- code. Note that all options for the bar labels (should you want them) are in the second line.

cibar age, over(ftv_condensed) /// syntax:  cibar Y, over(X1)
barlabel(on) blfmt(%4.1f) blsize(small) blposition(swest) blcolor(white) ///
level(83) /// specifies the CI level to be 95% (this is the default)
ciopts(lcolor(black) lwidth(medthick) type(rspike)) /// changes CI color, thickness, and style
barcolor(red%100 midblue%80 green%80 magenta%80) /// colors each bar in order of the bars
baropts(lcolor(cyan) lwidth(medthin) fintensity(100)) /// changes the bar outlines and color intensity
bargap(50) /// places a big gap between the bars (otherwise there will be no gap)
graphopts( /// start of graphopts option
ytitle(Average Age of Mother, col(gs8)) /// titles y-axis
xtitle(Visits to Physician in First Trimester, col(gs8)) /// titles x-axis
ylab(18(1)28, glcolor(gs15) glpattern(solid)) /// y-axis label options and horizontal gridline options
xlab(, nogrid) /// x-axis label options and vertical gridline options
yscale(lcolor(black) lwidth(medium)) /// change y-axis line look
xscale(lcolor(black) lwidth(medium)) /// change x-axis line look
xsize(6.5) ysize(4.5) graphregion(margin(vsmall)) /// graph dimensions and margin b/w plot and edge
scheme(stcolor) /// use stcolor scheme (Stata 18 only)
note("Note: 83% CIs Shown", span size(vsmall)) /// note in bottom left corner
legend(pos(3) ring(1) col(1) size(medium)) /// legend options
legend(title("Number of Visits", size(medium) margin(tiny) box fcolor(gs6) color(white) bexpand)) /// legend title
) // Note this last parenthesis! closes graphopts

Here’s the graph we get:

Bar graph with labels and 83% CIs. Shows that mothers who do not see a physician in the first semester tend to be significantly younger than mothers who see a physician.

Multiple “over” Variables

The -cibar- package conveniently allows for up to three “over” variables. The syntax is that the first variable listed within “over( )” is the variable that will color the bars and be displayed in the legend; the second variable is what will be arrayed along the x-axis. Let’s see some code!

(Note: the first part is a little trick to have bold labels for the x-axis variable, which I learned from the ingenious Andrew Musau on Statalist.)

#Trick to create bold labels
decode race, gen(race_string) // first make a string version
gen bfrace = "{bf:" + race_string + "}" // next do the trick
encode bfrace, gen(bfrace_num) // next make the bold version a numeric variable

## cibar with two "over" variables
cibar age, over(ftv_condensed bfrace_num) /// syntax: cibar Y, over(legendvar xaxisvar)
level(83) /// specifies the CI level to be 95% (this is the default)
ciopts(lcolor(white) lwidth(medium) type(rspike)) /// makes the confidence intervals black and medium thickness
barcolor(%85 %85 %85 %85) ///
baropts(lcolor(black) lwidth(vthin) fintensity(100)) ///
bargap(10) /// places a big gap between the bars (otherwise there will be no gap)
graphopts( /// start of graphopts option
ytitle(Average Age of Mother, col(gs8)) /// titles y-axis
xtitle(Visits to Physician in First Trimester, col(gs8)) /// titles x-axis
ylab(18(1)30, glcolor(gs15) glpattern(solid)) /// y-axis label options and horizontal gridline options
xlab(, nogrid) /// x-axis label options and vertical gridline options
yscale(lcolor(black) lwidth(medium)) /// change y-axis line look
xscale(lcolor(black) lwidth(medium)) /// change x-axis line look
xsize(6.5) ysize(4.5) graphregion(margin(vsmall)) /// graph dimensions and margin b/w plot and edge
scheme(gg_w3d) ///
note("Note: 83% CIs Shown", span size(vsmall)) /// note in bottom left corner
legend(pos(3) ring(1) col(1) size(medium)) /// legend options
legend(title("Number of Visits", size(medium) margin(tiny) box fcolor(gs6) color(white) bexpand)) /// legend title
) // Note this last parenthesis! closes graphopts

That gives us this graph:

And if you’re feeling like you want to go totally nuts, here’s an example with three “over” variables. The syntax is that the first “over” variable colors the bars / goes in the legend; the second populates the minor values on the x-axis; and the third populates the major values on the x-axis.

Here we’re going to use the child’s birthweight as our y-axis variable, and look at this across smoking status (mother smoked: yes or no), doctor visits and racial identification. I’ll also create an even more condensed version of the doctor visits variable.

#condensed doctor visits variable

recode ftv (0=0 "None") (1=1 "One") (2 3 4 5 6=2 "Two or More"), gen(ftv_condensed2)

cibar bwt, over(smoke ftv_condensed2 bfrace_num) /// syntax: cibar Y, over(legendvar minorxaxisvar majorxaxisvar)
level(83) ///
ciopts(lcolor(gs14) lwidth(medium) type(rspike)) ///
barcolor(midblue%90 red%90 ) ///
baropts(lcolor(gs12) lwidth(thin) fintensity(100)) ///
graphopts( ///
ytitle(Birthweight of Child (grams)) ///
xlab(, labsize(vsmall) nogrid) ///
ylab(, glpattern(solid) glcolor(gs4)) ///
scheme(black_jet) ///
note("Note: 83% CIs Shown", span size(vsmall)) /// note in bottom left corner
legend(title("Smoking Status", size(small) margin(tiny) box fcolor(gs6) color(white) bexpand)) ///
legend(ring(0) pos(11)) ///
xsize(6.5) ysize(4.5) graphregion(margin(vsmall)) ///
) // close graphopts

And here’s our graph!

cibar with three “over” variables and “black_jet” scheme. The graph suggests that Black mothers who smoke and have not had any doctor visits are most at-risk for having low-birthweight children, though the pattern of smoking and low birthweight is evident for White mothers as well, particularly those who have only had one doctor visit in the first trimester.

Using the -coefplot- Package for Bar Graphs of Means & CIs

Most people know -coefplot- for making regression coefficient plots (see my guide on these here). But -coefplot- is actually so flexible that it can also be used for bar graphs of means with CIs.

Here’s how to do it. First, we need to obtain the means we want to graph and store the estimates. We’ll stick with our “lbw.dta” dataset for now and try to replicate the same -cibar- graph from above.

# Get the estimates
mean age, over(ftv_condensed)
estimates store agemeans

#Make the bare-bones graph
coefplot agemeans, recast(bar)

And here is a bare-bones coefplot bar graph:

Bare-bones coefplot bar graph (Stata’s “stcolor” scheme)

Ouch. Not great, especially compared with our first -cibar- graph above. It’s cool that we can have it be horizontal (-cibar- can’t do that), but otherwise, it looks like we have some work to do to get it to be more informative and visually appealing.

Again, this is why -coefplot- is the more complicated of the two packages, but it can also handle more involved analyses, which we’ll see in a little bit.

Here are a variety of -coefplot- options to improve this graph:

coefplot agemeans, recast(bar) /// basic command
vertical /// make the bars vertical
coeflabels(c.age@0.ftv_condensed="None" c.age@1.ftv_condensed="One" /// change the value labels
c.age@2.ftv_condensed="Two" ///
c.age@3.ftv_condensed="Three or More") ///
barwidth(.5) /// change the width of the bars; this is used instead of -cibar's- "bargap" option
color(midblue%90) blcolor(navy) blwidth(medium) bfintensity(100) /// bar color and outline options
level(83) /// CI level
citop /// layer the CIs *on top* of the bar (easier to see them)
ciopts(lcolor(magenta) lwidth(medthick)) /// CI line and width options
xtitle(Number of Doctor Visits in First Trimester) /// title x-axis
ytitle(Average Age of Mother) /// title y-axis
xsize(6.5) ysize(4.5) /// graph dimensions (inches)
graphregion(margin(vsmall)) /// margin b/w plot and edge
ylab(18(2)28, glpattern(solid) glcolor(gs15)) /// ylabel options and look of horizontal gridlines
scheme(white_jet) ///
note("Note: 83% CIs Shown", span size(vsmall)) // note in bottom left corner

And here is the graph:

coefplot bar graph with 83% CIs and “white_jet” scheme. Replicates the third -cibar- graph example above.

One big advantage of -coefplot- over -cibar- is that you can plot means from multiple variables — not just one variable over values of another variable.

To see an example, we’ll use data on life expectancies of different groups of U.S. citizens. Let’s load the data, and then obtain and store means of all the variables we want:

#Load the U.S. life expectancy data
sysuse uslifeexp.dta, clear

#Get the means
mean le le_male le_female le_w le_b

#Store the means
est store allmeans

One technical note: since we’re saving this as one set of means/CIs, there will be no legend, and hence all bars will be the same color. (We’ll see a coefplot with a legend below.)

Here’s the code, which also features multiple CI levels (yet another thing -cibar- cannot do) and leaves the graph as horizontal:

coefplot allmeans, recast(bar) /// basic command
barwidth(.4) color(blue%70) bfintensity(100) /// change bar width, color, and color intensity
blcolor(cyan) blwidth(medthin) /// change bar outline color and width
coeflabels(le="All Americans" le_male="Male" le_female="Female" le_w="White" le_b="Black") /// change value labels
levels(95 83) /// specify two CI levels
citop /// layer the CIs *on top* of the bar (easier to see them)
ciopts(lcolor(gs10 magenta)) /// specify CI colors: note that first (second) color is for first (second) CI level
xtitle("{bf: Average Life Expectancy}") /// title x-axis in bolded text
xlab(, glcolor(gs15) glpattern(solid)) /// label colors on x-axis vertical gridline color/style
grid(noticks glpattern(solid) glcolor(white)) /// grid options
ylab(, noticks nogrid) /// more grid options (for y-axis)
xsize(6.5) ysize(4.5) graphregion(margin(vsmall)) ///
note("Note: 83% & 95% CIs Shown", span size(vsmall)) /// note in bottom left corner
title("Average Life Expectancy in the U.S.", box color(white) span fcolor(black) lcolor(black) bexpand) ///
subtitle("1900 - 1999", box color(white) span fcolor(black) lcolor(black) bexpand) //

And here’s the graph:

A coefplot Bar Graph with a Legend, Using an “over” Variable, and Combining Graphs

Now let’s make a version with a legend, which will result in bars with different colors. The trick is that each mean/CI needs to be saved as its own stored estimate. Then, in the code, we add label and color options separately for each stored estimate.

We’ll also save this graph, so you may want to set your working directory (File > Change working directory>[select location]):

#Get each estimate and store separate
mean le
est store mod_le

mean le_male
est store mod_lemale

mean le_female
est store mod_lefemale

mean le_w
est store mod_lewhite

mean le_b
est store mod_leblack

#Make coefplot and save as "g1"
coefplot (mod_leover, label(All Americans) color(blue%85) lcolor(blue) ciopts(lcolor(gs6 midblue))) ///
(mod_lemaleover, label(Male) color(red%85) lcolor(red) ciopts(lcolor(gs6 orange))) ///
(mod_lefemaleover, label(Female) color(stgreen%85) lcolor(green) ciopts(lcolor(gs6 green))) ///
(mod_lewhiteover, label(White) color(styellow%85) lcolor(styellow) ciopts(lcolor(gs6 yellow))) ///
(mod_leblackover, label(Black) color(lavender%85) lcolor(purple) ciopts(lcolor(gs6 purple))), ///
recast(bar) barwidth(.4) bfintensity(100) ///
ciopts(lwidth(thin medthick)) ///
levels(95 83) citop ///
coeflabels(c.le@0.post1950="Pre-1950" c.le@1.post1950="Post-1950" ///
c.le_male@0.post1950="Pre-1950" c.le_male@1.post1950="Post-1950" ///
c.le_female@0.post1950="Pre-1950" c.le_female@1.post1950="Post-1950" ///
c.le_w@0.post1950="Pre-1950" c.le_w@1.post1950="Post-1950" ///
c.le_b@0.post1950="Pre-1950" c.le_b@1.post1950="Post-1950") ///
xtitle("Average Life Expectancy") ///
xlab(, glcolor(gs15) glpattern(solid)) ///
legend(pos(3) col(1) size(small) colgap(vsmall) keygap(tiny)) ///
grid(glpattern(solid) glcolor(gs15)) ///
ylab(, labsize(small)) ///
xsize(6.5) ysize(4.5) graphregion(margin(vsmall)) ///
scheme(white_jet) ///
saving(g2, replace)

And here’s the graph we get:

coefplot of means of separate groups’ life expectancies (white_jet scheme)

Now we’re going to create an “over” variable to use for our coefplot. Given the data we have, let’s create a simple binary variable to distinguish between the years 1900–1949 versus 1950–1999. This will allow us to make the same graph as above, but calculate separate means for each time period.

# Create the "over" variable:  post1950
gen post1950=0 if inrange(year, 1900, 1949)
replace post1950=1 if inrange(year, 1950, 1999)
label define post1950 0 "Pre-1950" 1 "Post-1950"
label values post1950 post1950

# Get the means over post1950 and store them
mean le, over(post1950)
est store mod_leover

mean le_male, over(post1950)
est store mod_lemaleover

mean le_female, over(post1950)
est store mod_lefemaleover

mean le_w, over(post1950)
est store mod_lewhiteover

mean le_b, over(post1950)
est store mod_leblackover

#Make the graph and save it as "g2"
coefplot (mod_leover, label(All Americans) color(blue%85) lcolor(blue) ciopts(lcolor(gs6 midblue))) ///
(mod_lemaleover, label(Male) color(red%85) lcolor(red) ciopts(lcolor(gs6 orange))) ///
(mod_lefemaleover, label(Female) color(stgreen%85) lcolor(green) ciopts(lcolor(gs6 green))) ///
(mod_lewhiteover, label(White) color(styellow%85) lcolor(styellow) ciopts(lcolor(gs6 yellow))) ///
(mod_leblackover, label(Black) color(lavender%85) lcolor(purple) ciopts(lcolor(gs6 purple))), ///
recast(bar) barwidth(.4) bfintensity(100) ///
ciopts(lwidth(thin medthick)) ///
levels(95 83) citop ///
coeflabels(c.le@0.post1950="Pre-1950" c.le@1.post1950="Post-1950" ///
c.le_male@0.post1950="Pre-1950" c.le_male@1.post1950="Post-1950" ///
c.le_female@0.post1950="Pre-1950" c.le_female@1.post1950="Post-1950" ///
c.le_w@0.post1950="Pre-1950" c.le_w@1.post1950="Post-1950" ///
c.le_b@0.post1950="Pre-1950" c.le_b@1.post1950="Post-1950") ///
xtitle("Average Life Expectancy") ///
xlab(, glcolor(gs15) glpattern(solid)) ///
legend(pos(3) col(1) size(small) colgap(vsmall) keygap(tiny)) ///
grid(glpattern(solid) glcolor(gs15)) ///
ylab(, labsize(small)) ///
xsize(6.5) ysize(4.5) graphregion(margin(vsmall)) ///
scheme(white_jet) ///
saving(g2, replace)

We get the following graph:

coefplot of groups’ life expectancies over the pre- vs. post-1950 period. All groups show sizable increases, though racial and gender differences persisted in the post-1950 period.

Lastly, what if we wanted to combine these two graphs into one? Good thinking! The main challenge is that we don’t want this combined graph to have two legends, and we don’t want just one side to have a legend — so what can we do?

Fortunately, there’s a cool user-written package called -grc1leg-, which functions exactly like Stata’s -graph combine-, but lets you take a legend from one of your graphs and use it for the combined graph. Here’s how to install it:

net install grc1leg, from("http://www.stata.com/users/vwiggins") replace

Now, the only other complication is that our graphs above had the legend in the 3 o’clock position and arrayed in one column. But for this combined graph, I’m thinking it would look better to have it at the bottom and in one row. I can’t seem to do this tweak with grc1leg, so instead let’s remake the “g2” graph with the legend at the bottom and in one row. After that, I’ll run the grc1leg command and ask to use the legend from the new “g2” graph:

# Make the second graph with the legend how I want it for the combined graph
coefplot (mod_leover, label(All Americans) color(blue%85) lcolor(blue) ciopts(lcolor(gs6 midblue))) ///
(mod_lemaleover, label(Male) color(red%85) lcolor(red) ciopts(lcolor(gs6 orange))) ///
(mod_lefemaleover, label(Female) color(stgreen%85) lcolor(green) ciopts(lcolor(gs6 green))) ///
(mod_lewhiteover, label(White) color(styellow%85) lcolor(styellow) ciopts(lcolor(gs6 yellow))) ///
(mod_leblackover, label(Black) color(lavender%85) lcolor(purple) ciopts(lcolor(gs6 purple))), ///
recast(bar) barwidth(.4) bfintensity(100) ///
ciopts(lwidth(thin medthick)) ///
levels(95 83) citop ///
coeflabels(c.le@0.post1950="Pre-1950" c.le@1.post1950="Post-1950" ///
c.le_male@0.post1950="Pre-1950" c.le_male@1.post1950="Post-1950" ///
c.le_female@0.post1950="Pre-1950" c.le_female@1.post1950="Post-1950" ///
c.le_w@0.post1950="Pre-1950" c.le_w@1.post1950="Post-1950" ///
c.le_b@0.post1950="Pre-1950" c.le_b@1.post1950="Post-1950") ///
xtitle("Average Life Expectancy") ///
xlab(, glcolor(gs15) glpattern(solid)) ///
legend(pos(6) row(1) size(vsmall) colgap(vsmall) keygap(tiny)) ///
grid(glpattern(solid) glcolor(gs15)) ///
ylab(, labsize(small)) ///
xsize(6.5) ysize(4.5) graphregion(margin(vsmall)) ///
scheme(white_jet) ///
saving(g2, replace)


# Now let's combine them!
grc1leg "g1.gph" "g2.gph" , /// combine them
legendfrom(g2.gph) /// take the legend from the second graph
ring(1) pos(6) span /// keep legend outside plot, 6 o'clock position, centered
xcommon /// use common x-axis since they are identical between g1 and g2
iscale(.8) scheme(white_jet) /// shrink text a bit; specify scheme
xsize(6.5) ysize(5.5) /// graph dimensions (inches)
graphregion(margin(small)) /// margin between plot and outer edge
imargin(small) /// reduce distance between two plots
title("Average Life Expectancy in the U.S.", box color(white) span fcolor(black) lcolor(black) bexpand) /// title
subtitle("1900 - 1999", box color(white) span fcolor(black) lcolor(black) bexpand) /// subtitle
saving(g1and2combined, replace) // save our combined graph

And here is the combined graph with one legend!

separate coefplots combined using grc1leg package

Featuring Logistic CIs

Lastly, I noted above that another advantage of -coefplot- is that it can handle asymmetric confidence intervals (-cibar- cannot). In applied research, a common form of this would be logistic confidence intervals, which occur when we look at means/proportions of a binary variable.

Let’s look at a quick example using the “lbw.dta” dataset again. Here we’ll look at the proportion of women who gave birth to a child classified as low birthweight (“low”; 0=no and 1=yes) based on their smoking status (“smoke”; 0=nonsmoker, 1=smoker).

Just as before, we first get the estimates, and then store them. Note the use of proportion here (since it is a binary variable) and “over”:

#get estimates and store

proportion low, over(smoke)
est store smoke

The output indicates the proportions. This can be a bit confusing at first because it shows us not only the proportion of “low”=1 at each value of smoke, but also the proportion of “low”=0 at each value of smoke.

All we actually care about are the latter two proportions. If you execute the following code, we can learn the names of these proportions, which we’ll need to make a nice coefplot (they will be shown on the y-axis, just like in the very first coefplot we made):

# Figure out the names of the proportions we want to show
coefplot smoke

Now we can add options to have a nice coefplot with logistic, rather than normal, CIs:

# coefplot
coefplot smoke, ///
color(stgreen%70) ///
recast(bar) barwidth(.5) bfintensity(100) lcolor(stgreen) lwidth(medium) ///
levels(95) citop ///
ciopts(type(rcap) lcolor(dkgreen) lwidth(medthick)) ///
drop(0.low@0.smoke 0.low@1.smoke ) ///
coeflabels(1.low@0.smoke="Non-Smoker" 1.low@1.smoke="Smoker") ///
ytitle("Smoking Status") ///
xtitle(Pr(Low Birthweight)) ///
xlab(0(.1).5, glcolor(gs15) glpattern(solid)) ///
legend(off) ///
grid(glpattern(solid) glcolor(gs15)) ///
ylab(, labsize(small)) ///
xsize(6.5) ysize(4.5) graphregion(margin(vsmall)) ///
title("Example of Logistic Confidence Intervals", box color(white) span fcolor(black) lcolor(black) bexpand) ///
subtitle("Smoking & Low Child Birthweight", box color(white) span fcolor(black) lcolor(black) bexpand) ///
scheme(gg_jet)

Last graph!

coefplot example with logistic CIs

I sincerely hope this guide has been helpful for you! Good luck and enjoy!

About the Author

John V. Kane is Clinical Associate Professor at the Center for Global Affairs and an Affiliated Faculty member of NYU’s Department of Politics. He received his Ph.D. in political science and his primary research interests include public opinion, political psychology, and experimental research methodology. His research has been published in a variety of top-ranking peer-reviewed journals, including the American Political Science Review, American Journal of Political Science, the Journal of Politics, and the Journal of Experimental Political Science. His research has been featured in numerous media outlets, including The New York Times, the Washington Post, and National Public Radio. He has taught graduate courses on political psychology, research methods, statistics and data analysis, and has also received teaching excellence awards from both New York University and Stony Brook University. His website is www.johnvkane.com. Follow him on X/Twitter, ResearchGate, LinkedIn, and/or BlueSky Social.

--

--

John V. Kane
The Stata Gallery

John V. Kane is an Associate Professor at NYU's Center for Global Affairs. He researches political attitudes & experimental methods. Twitter: @UptonOrwell