mean_ci_plot: It’s Means and Confidence Intervals all around!

Shritha Sampath
The Stata Gallery
Published in
5 min readJul 11, 2024

--

Mean and Confidence Intervals: A Peek into What the Data Holds

Working with a new dataset always comes with a gamut of fun challenges for us to tackle. The numbers have scores of stories to tell, but they require work to get to, which is no easy task. Perhaps the best first step, the best first question to ask of the data is — well, what do you really mean?

Okay, the point that I’m trying to make might be getting a little lost in this (possibly unnecessary) analogy, but here’s what I’m getting at — once we have established a set of our main variables of interest, it only makes sense to start by computing their means and the associated confidence intervals. Means (and confidence intervals) provide a great starting point for further analysis — the first moment for a reason!

And yet, simply obtaining a lengthy, numbery list of means is of little use to us. A visual representation of the same numbers would allow us to tease out trends, the tentative ranges of values, the relative widths of confidence intervals, and what have you.

This is where mean_ci_plot enters the picture.

mean_ci_plot: Syntax and Examples

mean_ci_plot varlist [IF] [IN], [by(varname)] [scale] [title()] [graphopts()]

We have here the syntax for mean_ci_plot, a Stata program created by Zaeen de Souza and Kabira Namit. Intuitive and easy to use, the program gives us a table accompanied by a visual representation of the means of all the variables in varlist, along with their confidence intervals. The syntax is straightforward while allowing flexibility, making it accessible to beginners and requiring but a few steps to obtain our desired statistics.

The command can accommodate “if” and “in” conditions, can divide the information by the groups that comes under varname, and allows customization of various sorts. We can adjust the scale of the plot, include a title and subtitle, and adjust the colour and width of lines as we prefer.

The program requires the coefplot package for full functionality, and both required packages can be installed from SSC:

ssc install mean_ci_plot
ssc install coefplot

(1) This example gives us the simplest possible version of the plot, no frills attached. We have three continuous variables under study, and no group variable.

*Set up dataset
sysuse auto.dta, clear

*Generate plot with means and confidence intervals
mean_ci_plot mpg trunk turn

(2) We can now introduce a group variable –

mean_ci_plot mpg trunk turn, by(foreign)

(3) In this third example, we work with one variable, and introduce a title as well as the scale option –

gen pct_score = runiform()*100 //creating a new variable that takes uniformly distributed values from 0 to 100
lab var pct_score “Percentage Score”
mean_ci_plot pct_score, title (“My custom title”) scale

(4) If/in conditions can be incorporated as follows. Using this command, we obtain the means of the three variables using only those observations for which price is greater than 5000 –

mean_ci_plot mpg trunk turn if price > 5000, by(foreign)

(5) We now explore three different commands for graph customization -

(i) This command alters the markers for the means from circles to squares and adds a title on the x-axis:

mean_ci_plot mpg trunk turn, by(foreign) graphopts(msymbol(S) xtitle("My X-Axis"))

(ii) In this case, we sort the three variables under study in increasing order of the means:

mean_ci_plot mpg trunk turn, graphopts(sort)

(iii) The following command presents the means as bars, and “noci” removes confidence intervals from the plot:

mean_ci_plot mpg trunk turn, graphopts(recast(bar) barwidth(0.15) noci mlabpos(3) title(, size(medsmall)))

There is, importantly, a factor that users must consider — mean_ci_plot generates graphs using the largest subset of observations for which complete information is available. Let us work through this with an example:

Say we have a primary dataset comprising a set of questions asked to 200 students in a school. At a later point, we add some questions to the survey and go back to the same school but are able to get responses from only 196 of the initial 200 students. With this new dataset in hand, mean_ci_plot will generate the required tables and plots using only these 196 observations, i.e., the largest subset of observations for which complete information is available. Users will also note that the balanced sample size is included in the graph — a helpful default feature!

Thank you for reading, and I am confident that this program will mean a whole lot of simplified visualization for you!

About the Program Creators

For any further information or clarifications, please contact Zaeen de Souza at zaeen.desouza19_mec@apu.edu.in or Kabira Namit at knamit@worldbank.org.

About the Author

Shritha Sampath is a student at Bocconi University in Milan, Italy, where she is pursuing a Masters’ degree in Economic and Social Sciences.

--

--

Shritha Sampath
The Stata Gallery

Hi! I’m Shritha, a master's student of economic and social sciences at Bocconi University in Milan, Italy.