The Stata-to-LaTeX guide
Last updated: 04 June 2023. The guide is now compatible with TeX 2022 compilers and fully utilizes the native estout package options to generate the tables without additional customizations in LaTeX.
In this guide, we will learn how to export Stata tables and regressions to LaTeX. This guide is updated intermittently to keep up with changes to the syntax, TeX compilers, and user requests.
The aim of this guide is not to discuss user-written commands, like esttab
or tabout
, but provide a replicable set of templates for applications. I myself use this guide all the time. Furthermore, this guide is not intended to be an introduction to regression analysis or a tutorial on LaTeX. Instead, it is aimed at users who are already familiar with both and would prefer not to spend hours searching bits and pieces of code online.
The LaTeX part is provided in a shared Overleaf document that can be viewed online. Users can either duplicate the template, or download the source code and locally compile the files. The document also contains a change log which tracks updates to this guide.
In order to follow the guide, the Stata part is discussed here for each table and the LaTeX code can be viewed on Overleaf. Tables generate in Stata are referenced with the same table number in the Overleaf, for example, Table exported as table1.tex will be Table X.1 on Overleaf.
Preamble
Stata
Like other guides, a basic knowledge of Stata is assumed. This guide deals with advanced usage of locals, loops, and code structures that require some experience and familiarity with Stata programming.
- Install the
estout
package:
ssc install estout, replace
The versions should show the following dates or more recent versions:
So please check and update if you have older versions. You can do this by typing ado update, update
.
LaTeX
For LaTeX, you should either have an online account on services like Overleaf or a local deployment of a LaTeX compiler (MikTeX, TeXLive etc). Personally, I have moved all my stuff to Overleaf since I am working across multiple computers. Because of the cloud compiling, I don’t have to deal with updating compilers and packages across the board. Additionally , it also allows for easy collaborations especially if some co-authors are not tech savy. Sometimes I also get the monthly subscription if track changes are needed, or faster compilation times are required for very large documents. But the free version is more than sufficient for basic use.
The LaTeX preamble including the required packages are given on this read-only Overleaf document:
The document is complete and compiles as a standard article class file.
It is very important that you check which compiler you are using. You can click on the “Menu” icon on the top left, which will pop up this window:
Here, make sure that the compiler is set to 2022. This is important because your old documents might have older compilers depending on whatever versions were available when you created the document. Since this guide updates the syntax of the tables, the old code might (or most likely) will break.
For Stata to LaTeX examples, it is good to bookmark Ben Jann’s own page who also wrote this brilliant package:
The help guide for the estout
is also fairly extensive and worth exploring carefully:
Statalist has a lot of useful posts on estout
so one can always search there as well.
Here I would also mention that some packages are necessary to load in LaTeX. For example, booktabs
for neater table formatting, dcolumn
for aligning decimals, and adjustbox
for resizing tables. There could be other better alternatives available. If you have suggestions, please open an Issue on the GitHub page with comments and suggestions.
Let’s get started
Anyone writing an article eventually has to deal with generating tables in Stata and formatting them in LaTeX.
Creating tables by hand is not advisable. Outputs change easily as data files are updated and cleaned, and copy-pasting the updated numbers is painful. Plut, this process is highly prone to errors.
In this post, we will look at various examples of generating summary statistic and regression tables. The main aim is to create a LaTeX-ready output using the estout
package.
The guide is split into two parts. Part I covers summary statistic tables, while Part II covers regressions. For the sake of replicability, default Stata’s datasets are used. Just a note of caution: example datasets are very clean, and chances of encountering errors or issues are negligible. If you do run into issues with your own dataset, please post them and discuss them in the Issues section on GitHub.
Part I: Tables
In this section we will cover basic tables. In Stata, summary statistic tables are generated from commands like summarize
, tabstat
, ttest
, corr
etc. Let’s start with a simple table:
Table 1: Basic example
First we load a dataset and do some minor cleaning:
sysuse census, clearforeach x of varlist pop-popurban death-divorce {
replace `x' = `x' / 1000
}
Now let’s generate a simple summary statistics table:
tabstat pop pop65p medage death marriage divorce, c(stat) stat(sum mean sd min max n)
which gives us this output:
We can store this estimates using the estpost
command, which also saves a bunch of info as e-class locals:
est clear // clear the stored estimatesestpost tabstat pop pop65p medage death marriage divorce, c(stat) stat(sum mean sd min max n)
ereturn list // list the stored locals
In the output above, all the variables that are in the e()
matrices can be pulled by the estout
package. This is important to know since in some custom written Stata commands, sometimes we needs to covert r-class locals to e-class locals, in order to make the stored estimates usable with estout
.
Let’s try some tables with various estout
options:
// basic
esttab, cells("sum mean sd min max count")// some options added
esttab, cells("sum mean sd min max count") nonumber ///
nomtitle nonote noobs label// formatted
esttab, ///
cells("sum(fmt(%6.0fc)) mean(fmt(%6.2fc)) sd(fmt(%6.2fc)) min max count") nonumber ///
nomtitle nonote noobs label collabels("Sum" "Mean" "SD" "Min" "Max" "N")
which gives us the following outputs:
The information stored by estpost
is processed by esttab
, and displays the table as it should look like in LaTeX. Even in the simple example above, a lot of options are specified. It is highlighly recommended that you look them up in the help esttab
page. The important ones are cells
which picks up the names from the e-class variables. We also define their formatting using the fmt()
option. If you are unfamiliar with formatting, have a look at help format
.
The command above only, specifies the output to be displayed on the screen. In order to export it to LaTeX, additional options need to be specified:
esttab using "./graphs/guide80/table1.tex", replace ////
cells("sum(fmt(%6.0fc)) mean(fmt(%6.2fc)) sd(fmt(%6.2fc)) min max count") nonumber ///
nomtitle nonote noobs label booktabs ///
collabels("Sum" "Mean" "SD" "Min" "Max" "N")
The key options are highlighted. The file is exported as table1.tex in the booktabs
format. Booktabs is a LaTeX package that gives a neat-looking tables. We manually replace the column headers with the collabels
option. The option label
picks the variable labels from Stata. So please label the variables neatly before your run the esttab
command.
The file table1.tex, can now be read in LaTeX. One can also view it using a generic text editor (like Notepad, or better still Notepad++ in Windows) if you do not have a local LaTeX version:
We can also go beyond the basic output described above, and even add captions and labels to the table directly in Stata:
esttab using "./graphs/guide80/table1_title.tex", replace ////
cells("sum(fmt(%6.0fc)) mean(fmt(%6.2fc)) sd(fmt(%6.2fc)) min max count") nonumber ///
nomtitle nonote noobs label booktabs ///
collabels("Sum" "Mean" "SD" "Min" "Max" "N") ///
title("Table 1 with title generated in Stata \label{table1stata}")
This above file is not added to the Overleaf document, since, I think, table captions and labels should be controled directly in LaTeX. But feel free to experiment with this. Such a table can be useful, especially if you are storing information for some estimation in the title, or generating headers through loops.
The files can be uploaded on Overleaf using the small up arrow icon shown in the screenshot below).
The LaTeX code is annotated in the Overleaf file, so that part of the code is not discussed here. Below we can see how this table looks like in the LaTeX output:
Tables 2 and 3: main/aux()
versus cells() options
We can also export summary statistics by some control groups. For example, we can calculate the mean and the standard deviation (SD), and other statistics by the region variable:
est clearestpost tabstat pop pop65p medage death marriage divorce, by(region) c(stat) stat(mean sd) nototal
Also check what is stored in the e-class:
ereturn list
The stored information can be exported in two ways.
First way is to use the simpler main()
option:
*** display on screen:esttab, main(mean %8.2fc) aux(sd %8.2fc) nostar nonumber unstack ///
nonote noobs label ///
collabels(none) ///
eqlabels("North East" "North Central" "South" "West") ///
nomtitles
which we can export to LaTeX as follows:
esttab using "./graphs/guide80/table2.tex", replace ///
main(mean %8.2fc) aux(sd %8.2fc) nostar nonumber unstack ///
compress nonote noobs gap label booktabs ///
collabels(none) ///
eqlabels("North East" "North Central" "South" "West") ///
nomtitles
and we get:
Here the main()
and aux()
options input the names from the e-class locals. If you do not want any customization, this is the quickest way to generate a table. Whatever is defined in main
goes on up, while aux
goes at the bottom.
The second, more advanced option is cells()
:
esttab, cells("mean(fmt(%8.2fc))" "sd(par)") nostar unstack nonumber ///
nonote noobs label ///
collabels(none) ///
eqlabels("North East" "North Central" "South" "West") ///
nomtitles
which is exported to LaTeX as follows:
esttab using "./graphs/guide80/table3.tex", replace ////
cells("mean(fmt(%8.2fc))" "sd(par)") nostar unstack nonumber ///
compress nonote noobs gap label booktabs ///
collabels(none) ///
eqlabels("North East" "North Central" "South" "West") ///
nomtitles
Here we can see that with the main
option, we pick elements from the e-class table namely “mean” and “sd”. This also the same in the cells
option. The main different is how the formatting is defined. In the first case, the format is just given in front of the variable, while in the cells option it is needs to be defined using the fmt
option. The cells option can go every further where we can define multiple outputs per table. For example, we can also stack observations, or minimum and maximum values etc.
Also note that here we are defining the column headers using eqlabels
and not collabels
. This is because we are taking a stacked output (all the by
variables are shown in one long column on top of each other), and unstacking it using the unstack
option (all the by
variables are different columns). Hence each column is treated as an “equation”. This might be a bit confusing but if you are stuck not knowing which one to you, test both of them out!
Table 4: Formatting cells
As briefly discussed above, we can export summary statistic tables using the more generic cells
option. This also allows for more flexibility in customizing the output. Let’s try some options. We again start with a simple table:
est clear
estpost tabstat pop pop65p medage death marriage divorce, by(region) c(stat) stat(mean sd) nototalesttab, cells(mean(fmt(2)) sd(par)) nostar nonumber unstack ///
nomtitle nonote noobs label ///
collabels(none) gap ///
eqlabels("N. East" "N. Central" "South" "West") ///
nomtitles
and it’s LaTeX version:
esttab using "./graphs/guide80/table4.tex", replace ////
cells(mean(fmt(2)) sd(par)) nostar unstack nonumber ///
compress nonote noobs gap label booktabs ///
collabels(none) ///
eqlabels("N. East" "N. Central" "South" "West") ///
nomtitles
Here we use another formatting option fmt(2)
which is short for two decimal places. This formatting is also passed on the SD. For SD, we also specify par
in brackets which stands for parenthesis. This gives us the curly brackets in the display.
This is the simplest way of formatting the table. We just define fmt(2)
once, and every thing is shown as 2 decimal places.
Table 5: Custom brackets
The curly brackets can also be modified. For example, the code below replace curly brackets with square brackets:
esttab, cells(mean sd(par([ ]))) nostar nonumber unstack ///
nomtitle nonote noobs label ///
collabels(none) ///
eqlabels("N. East" "N. Central" "South" "West") ///
nomtitles
Effectively, any options specified in par( )
will replace the opening and closing brackets. Here one can for example also use par(< >)
.
For exporting to LaTeX:
esttab using "./graphs/guide80/table5.tex", replace ////
cells(mean sd(par([ ]))) nostar unstack nonumber ///
compress nonote noobs gap label booktabs ///
collabels(none) ///
eqlabels("N. East" "N. Central" "South" "West") ///
nomtitles
Tables 6 and 7: Custom decimal places
We can also play around with decimal places and really customize them for each element different. For example:
esttab, cells(mean(fmt(2)) sd(fmt(3) par)) nostar nonumber unstack ///
nomtitle nonote noobs label ///
collabels(none) ///
eqlabels("N. East" "N. Central" "South" "West") ///
nomtitles
will give the means in 2 d.p. while the standard errors will be in 3 d.p.:
And exporting back to LaTeX:
`
esttab using "./graphs/guide80/table6.tex", replace ////
cells(mean(fmt(2)) sd(fmt(3) par)) nostar unstack nonumber ///
compress nonote noobs gap label booktabs ///
collabels(none) ///
eqlabels("N. East" "N. Central" "South" "West") ///
nomtitles
we get:
Next we try even further customization where the mean has increasing number of decimal places, while the SD has a decreasing number of decimal places as we go down the rows:
esttab, cells(mean(fmt(1 2 3 4)) sd(fmt(3 2 1 0) par)) nostar nonumber unstack ///
nomtitle nonote noobs label ///
collabels(none) ///
eqlabels("N. East" "N. Central" "South" "West") ///
nomtitles
esttab using "./graphs/guide80/table7.tex", replace ////
cells(mean(fmt(1 2 3 4)) sd(fmt(3 2 1 0) par)) nostar unstack nonumber ///
compress nonote noobs gap label booktabs ///
collabels(none) ///
eqlabels("N. East" "N. Central" "South" "West") ///
nomtitles
which gives us this unusual looking table:
This allows one to customize tables where the units across variables are very different. For example, some variables might be given in millions while some might be in fractions. But always better to homogenize these and stick to one reporting format! Also you don’t want to see rows and rows of coefficients that look like 0.000***
. This would annoy any referee or a reader.
Table 8: A fully formatted table with variable groups
Here we generate a table where variables in the rows are grouped. We can also incorporate LaTeX elements in the labels:
foreach x of varlist pop pop65p medage death marriage divorce {
local t : var label `x'
local t = "\hspace{0.25cm} `t'"
lab var `x' "`t'"
}
Here we add a space of 0.25 cm before the description of the variable label. Note that this is hard change in the variable names! So these will start showing up in Stata like this as well:
Here we are “tricking” LaTeX to read the names and apply a horizontal space \hspace
function.
Next we generate the summary statistics:
est clearestpost tabstat ///
pop pop65p medage death marriage divorce, ///
c(stat) stat(mean sd min max n)esttab, cells("mean sd min max count")
estout, cells("mean sd min max count")
And we customize the output as follows:
esttab using "./graphs/guide80/table8.tex", replace ////
refcat(pop "\emph{Demographic}" death "\vspace{0.1em} \\ \emph{Status}", nolabel) ///
cells("mean(fmt(%8.0fc %8.0fc %8.0fc %8.0fc 2)) sd min max count(fmt(0))") nostar unstack nonumber ///
compress nomtitle nonote noobs label booktabs ///
collabels("Mean" "SD" "Min" "Max" "N")
We can use the refcat
option that picks variables as reference points, and adds a name to each group of variables. Note that we are also customizing the format of each row. From the code above, we get this LaTeX table:
Note here how the use of \hspace
in var label
gives the need formatting above where we define the groups by using the refcat
option.
Table 9: T-tests
For t-tests, we define two groups:
tab regioncap drop dummy
gen dummy = inlist(region, 1,4)
where we want to North East (region=1) and West (region = 4) with North Central (region = 2) and South (region = 3). Let’s get a basic output:
ttest pop, by(dummy)
where we compare the population which appear to be statistically the same:
For the two groups we can also perform t-tests across a bunch of variables:
global vars pop pop65p medage death marriage divorce
Note that t-test itself does not allow several variables to be defined by estout
allows it. So for the above groups we use the following command:
est clear
estpost ttest $vars, by(dummy)ereturn list
We can display the output in Stata window using either of the two commands:
esttab
esttab, wide
where the first option stacks the mean and S.E. of the difference while the second option shows the S.E. in the same row as the mean.
We can export this table to LaTeX as follows:
esttab using "./graphs/guide80/table9.tex", replace ///
cells("mu_1(fmt(1)) mu_2 b(star) se(par) count(fmt(0))") ///
collabels("North" "South" "Diff. (North - South)" "s.e." "obs." ) ///
star(* 0.10 ** 0.05 *** 0.01) ///
label booktabs nonum gaps noobs compress
Note the use of cells and the variable names that we are picking. There is nothing special about the t-test itself. It is just that we are formatting the cells()
option in a creative way, that gives us table that looks fairly complicated to make:
Similar tables can be constructed using any information stored in the e-class locals.
Table 10: Summary statistics by different control groups
Sometimes we want to generate summary statistic tables by different control groups. Note that there are not “by” groups which are mutually exclusive, but other controls that can have overlaps in categories. For example within groups like the World, Europe, European Union, Euro Area, High income countries there are high overlaps but there are other countries in there as well.
In order to summary these categories we need to move away from tabstat
. Instead here we need to use a combination of eststo
or estimates store, and the summarize
or summ
command for different control groups (that also overlap):
est cleareststo grp1: estpost summ $vars
eststo grp2: estpost summ $vars if region==1
eststo grp3: estpost summ $vars if dummy==1
eststo grp4: estpost summ $vars if pop > 2000
where the summary statistics are stored in the grp*
estimates. Here eststo
stores the estimates. One can collect as many as one want and call them later as well.
Next step, we can can generate a mock table on screen by calling the stored estimates:
esttab grp*, main(mean %6.2f) aux(sd) mtitle("All" "N. East" "Coasts" "High pop")
And finally, we can export to LaTeX easily as follows:
esttab grp* using "./graphs/guide80/table10.tex", replace ///
main(mean %8.2fc) aux(sd %8.2fc) nostar nonumber unstack ///
compress nonote noobs gap label booktabs ///
collabels(none) mtitle("All" "N. East" "Coasts" "High pop")
which gives us:
Looks simple but took some time to figure out! On to the regressions:
Part II: Regressions
For regression outputs, countless examples exist online for the estout
package. Here I go introduce some examples that explain the concept of how regression tables can be generated. These deal with non-standard aspects like lagged terms, interactions, and generating tables from several combinations of regressions, stacking variables etc.
Let’s start with a core regression block, where we generate all the numbers in one go. Let’s start with the basic regression block 1:
Here, we generate the results in one go. This is essentially a bunch of dependent variables regressed over a bunch of independant variables. The independent variables can be vary but have some common elements for the table to have some meaning. For Block 1, we need to format the results in terms of display, and add headers and footers. So let’s jump right into it:
First let’s load and clean up the data:
webuse nlswork, clearxtset idcode yeargen age2 = age^2
gen ttl_exp2 = ttl_exp^2
gen tenure2 = tenure^2
gen black = (race==2)lab var age "Age"
lab var age2 "Age sq."lab var ttl_exp "Work experience"
lab var ttl_exp2 "Work experience sq."lab var tenure "Job tenure"
lab var tenure2 "Job tenure eq."lab var not_smsa "SMSA (=1)"
lab var black "Black (=1)"
lab var south "South (=1)"
lab var union "Union (=1)"
Regression 1: The basic building block
Let’s start with two basic regressions. In the first one, we regress the log of wages ln_w
on several variables. The second regression just adds some additional controls to the first one. These regressions are just for illustrating so don’t start interpretting them!
In the code below, we also store the estimates of the panel regression by calling the eststo
command:
est cleareststo: xtreg ln_w age* ttl_exp* tenure* not_smsa , vce(robust)eststo: xtreg ln_w age* ttl_exp* tenure* not_smsa black south, vce(robust)
Running the above commands will show the standard Stata output:
and at the bottom it will also say that the regression results have been stored. We can display these stored estimates as follows:
esttab
esttab, label
The first option displays the outputs as they are, while the second option uses the variable labels we defined earlier:
Let’s fix the decimal places and get rid of the column headers since this is redundant information that usually goes in the table caption:
esttab, b(3) se(3) nomtitle label
which gives us a neater output:
The next step is extremely important (and was a major cause of stress until this was figured out): esttab
by default shows significance levels at the 5%, 1% and 0.1% level, while we are trained to assume that ***, **, and * represent 10%, 5%, and 1% respectively! This can be corrected as follows:
esttab, b(3) se(3) nomtitle label star(* 0.10 ** 0.05 *** 0.01)
which gives us this table:
While for our example, everything is unfortuately very significant, so the there is no change in the stars, it can make a huge different in your own examples. So be careful!
Now the next step is to export this table in LaTeX format. There are two ways of doing this. Either we export it ready for LaTeX directly with everything formatted in Stata, or we just export the table body and plug-in the table premable and formatting in LaTeX.
For the first option with a title and label:
esttab using "./graphs/guide80/regression1_1.tex", replace ///
b(3) se(3) nomtitle label star(* 0.10 ** 0.05 *** 0.01) ///
booktabs ///
title("Basic regression table \label{reg1}") ///
addnotes("Data: websuse nlswork" "Second line note")
Here we use the booktabs
package which neatly formats a table. The last two lines add the title and footnotes. Note that in the addnote
option, one can start a new line by using a new set of quotation marks addnote("Line 1" "Line 2")
.
All this requires in LaTeX is the following code:
\input{regression1_1.tex}
The second option where we just export the core table without the headers and footers:
esttab using "./graphs/guide80/regression1_2.tex", replace ///
b(3) se(3) nomtitle label star(* 0.10 ** 0.05 *** 0.01) ///
booktabs nonotes
In order to add this table to LaTeX, we need to do slightly more work:
Please see Overleaf for the implementation of the two tables. Here I would also advice to look at the table output file which can be viewed in a local LaTeX editor, if you have one installed, otherwise any text reading software like Notepad or Notepad++ (highly recommended) in Windows can be used as well. Here is what the TeX output of the two files looks like:
Notice the preamble before tabular
missing in the second file. The table generated from the first option looks like this in LaTeX:
Regression 2: Add controls at the bottom of regression tables
Most of the times when we are running regressions, we want to indicator what type of regression it is. This is highlighted at the bottom of the regression table with addition info. This block could potentially be number of observations, some R2, F-test, or other values.
Here we will learn how to add this information in Stata for our LaTeX tables. Let’s start with a basic example:
global controls age* ttl_exp* tenure* not_smsa south unionest cleareststo: xtreg ln_w $controls, re vce(robust)
estadd local FE "No"
estadd local TE "No"
where we add info on whether we are using fixed effects (FE) or time effects (TE). Notice in the output that the estadd
option generates e-class locals:
We can check what all was stored from the regression by typing:
return list
mat li r(table)
The r(table)
gives us the variable names in the row on the top. This will come in handy later so keep it in mind.
Let’s extend the stored estimated by adding more regression variations and storing them in estadd
:
est cleareststo: xtreg ln_w $controls, re vce(robust)
estadd local FE "No"
estadd local TE "No"
eststo: xtreg ln_w $controls, fe vce(robust)
estadd local FE "Yes"
estadd local TE "No"
eststo: xtreg ln_w $controls i.year, re vce(robust)
estadd local FE "No"
estadd local TE "Yes"eststo: xtreg ln_w $controls i.year, fe vce(robust)
estadd local FE "Yes"
estadd local TE "Yes"
Notice that in the last two sets, we are also added time dummies for time fixed efects. If we have a lot of years then, we will get a very large regression table since we will have a coefficient for each year. We do not want this!
Let’s control what we want to display:
esttab, ///
b(3) se(3) ///
keep($controls) ///
star(* 0.10 ** 0.05 *** 0.01) ///
label noobs nonotes nomtitle collabels(none) compress ///
scalars("rho \$\rho\$" "TE Time Effects" "FE Fixed effects") sfmt(3 0)
where we keep the variables names stored in the $controls
global macro. The scalars
option tells us what to display. Here we pick three elements and also fix their labels. The sfmt
option formats the displays. Here we are saying that the first scalar we are displaying should have three d.p. while the rest should have zero. The output looks like this:
We can now export the table back to LaTeX:
esttab using "./graphs/guide80/regression2.tex", replace ///
b(3) se(3) ///
keep($controls) ///
star(* 0.10 ** 0.05 *** 0.01) ///
label booktabs noobs nonotes nomtitle collabels(none) compress alignment(D{.}{.}{-1}) ///
scalars("rho \$\rho\$" "TE Time fixed effects" "FE Panel fixed effects") sfmt(3 0)
and we get our table with the additional block at the bottom:
All the scalars that exist in the regression output can be called here directly, for example rho
. Plus additional scalars can be specified using the estadd local
option. This can also include calculations from regression estimates, e.g. some linear combinations, or turning points etc.
Regression 3: Unusual coefficients
If we are using lags or interactions in the regressions, then the coefficients names are not so trivial to push to LaTeX outputs. Let’s start with a simple example:
est clear
eststo: xtreg ln_w i.black#i.collgrad, vce(robust)
and view the stored table by typing mat li r(table)
:
This table is useful for picking variable names in case you are using lags or interaction terms and it is not clear how the names are stored.
Let’s say we are only interested in the interaction coefficient which is Black x CollegeGrad, or where both the dummies equal one, and just export this coefficient:
esttab, ///
keep(1.black#1.collgrad) coeflabel(1.black#1.collgrad "Black X College") ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
label nonotes nomtitle collabels(none) ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
label nonotes nomtitle collabels(none)
where we get this minimalisitic-looking output:
Now, let’s say we want to adding another regression which has a lag term, and we also want to report that lag term:
eststo: xtreg ln_w i.black#i.collgrad age L1.hours, vce(robust)
Our lag term is showing up as hours L1. in the table. And this is definitely not the name it is stored as. Let’s check r(table)
:
And here we get the correct name L.hours
. Let’s add this to the estout
command as well including a name fix:
esttab, ///
keep(1.black#1.collgrad L.hours) coeflabel(1.black#1.collgrad "Black X College" L.hours "Hours(t-1)") ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
label nonotes nomtitle collabels(none) compress
Looks good! and we can export it back to LaTeX:
esttab using "./graphs/guide80/regression3.tex", replace ///
keep(1.black#1.collgrad L.hours) coeflabel(1.black#1.collgrad "Black X College" L.hours "Hours(t-1)") ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
label booktabs noobs nonotes nomtitle collabels(none) compress alignment(D{.}{.}{-1})
Regression 4: Regressions by groups
In this example, the same dependent variable is used for two different regression specifications, and we use a bunch of dependent variables here. Here we introduce Regression Block 2:
Each column of the results part in Block 2, represents a set of regressions of the type done in Block 1. These can still be looped in one go but need additional syntax to differentiate between them.
Let’s start this example, by using a global to define all the controls:
global controls age* ttl_exp* tenure* not_smsa south unionest clearforeach x of varlist ln_wage hours wks_ue {
eststo: xtreg `x' $controls, re vce(robust)
estadd local FE "No"
eststo: xtreg `x' $controls, fe vce(robust)
estadd local FE "Yes"
}
Let’s check the esttab
:
A large output! We want to export this table with the regression pairs identified and labeled properly:
esttab using "./graphs/guide80/regression4.tex", replace ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
label booktabs nonotes noobs nomtitle collabels(none) ///
scalars("FE Fixed effects") sfmt(0) ///
mgroups("Ln(Wages)" "Hours worked" "Weeks unemployed(t-1)", pattern(1 0 1 0 1 0) ///
prefix(\multicolumn{@span}{c}{) suffix(}) span erepeat(\cmidrule(lr){@span})) alignment(D{.}{.}{-1})
The mgroups
option clusters the regressions in groups. Since each dependent variable has two regressions, it follows a 1 0 pattern. If it were three regressions per group, then it would have been 1 0 0 1 0 0 and so on. The prefix()
option adds additional LaTeX syntax, while erepeat()
is necessary to add lines underneath the headings defined in mgroups
.
This LaTeX file is compiled as follows:
The column numbers and the table footer can also be customized to add more information. Here we are differentiating the columns by the “Fixed effects” row at the bottom. We can also do other groupings where the different regressions have different controls etc.
Regression 5: Collecting a coefficient of interest from multiple regressions
Here we will generate a very specific table where one coefficient of interest is collected from a bunch of regressions. This can be useful for just displaying the coefficients of interest of one or more interventions for a bunch of tests and specifications. One comes across such tables in papers frequently which show the key coefficients without showing all possible regression outputs.
This is the most complicated form that we call Regression Block 3:
This table format condenses a large chunk of regressions, into a table that shows the coefficients of interest. This means that we not only need to loop horizontally, but also vertically. This implies considerable code manipulation to give us the table format we want.
Going back to our example, we regression three dependent variables ln_wage
, hours
, wks_ue
on a bunch of independant variables. We want each row to show the coefficient of our variable of interest for four different clustering types. So we need to generate a 4 rows by 3 columns (4x3) table.
In order to operationalize this in Stata, we need to split the table generation into three panels. The top panel which has the header information plus the first set of coefficients but nothing below. The middle panels which just have the coefficients and nothing else above and below. And lastly, the bottom panel, which has the last set of coefficients and the table footer. If you understand this concept, the rest is just making sure we have the syntax right.
Top panel
Let’s start with the top panel:
**** top panelest clearforeach x of varlist ln_wage hours wks_ue {
eststo: xtreg `x' i.black#i.collgrad, re vce(robust)
}esttab using "./graphs/guide80/regression5.tex", replace f ///
prehead(\begin{tabular}{l*{@M}{r}} \toprule) ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
keep(1.black#1.collgrad) varlabels(1.black#1.collgrad "Random effects") ///
label booktabs noobs nonotes collabels(none) ///
alignment(D{.}{.}{-1}) ///
mtitles("Ln(Wages)" "Hours worked" "Weeks unemployed(t-1)")
Here we keep the interaction term only. Note that one can see the name from mat li r(table)
option and we manually label it using the coeflabel
option. The rest of the code is fairly straightforward.
Middle panels
The middle panels have all the options of labels and lines turned off:
*** center panel 1 ***est clearforeach x of varlist ln_wage hours wks_ue {
eststo: xtreg `x' i.black#i.collgrad i.year, re
}esttab using "./graphs/guide80/regression5.tex", append f ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
keep(1.black#1.collgrad) varlabels(1.black#1.collgrad "Year effects") ///
label booktabs nodep nonum nomtitles nolines noobs nonotes collabels(none) alignment(D{.}{.}{-1})*** center panel 2 ***est clearforeach x of varlist ln_wage hours wks_ue {
eststo: xtreg `x' i.black#i.collgrad, be
}esttab using "./graphs/guide80/regression5.tex", append f ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
keep(1.black#1.collgrad) varlabels(1.black#1.collgrad "Between effects") ///
label booktabs nodep nonum nomtitles nolines noobs nonotes collabels(none) alignment(D{.}{.}{-1})
Each panel is generated separately and is appended to the original file using the append
option. The f
option stands for fragment, and disables all LaTeX preamble and only stores the numbers and nothing else. We also turn off all the lines and column headers and footers.
Bottom panel
The bottom panel is the most complicated one since here we replace coeflabel
with varlabels
which is a programming option that can overwrite a lot of the internal estout
code (so use it carefully).
*** bottom panel ***est clearforeach x of varlist ln_wage hours wks_ue {
eststo: xtreg `x' i.black#i.collgrad, pa
}esttab using "./graphs/guide80/regression5.tex", append f ///
postfoot(\bottomrule \end{tabular}) ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
keep(1.black#1.collgrad) varlabels(1.black#1.collgrad "Pop.-avg. estimator", elist(1.black#1.collgrad \bottomrule)) ///
label booktabs collabels(none) nomtitles nolines nonum alignment(D{.}{.}{-1}) sfmt(%6.0fc)
Here we turn off the lines using the noline
option and manually specify a line in the varlabels
using the elist
and them \bottomrule
option. Note that bottomrule
is a booktabs
option in LaTeX. For tables without booktabs
, bottomrule
can be replaced with hline
. Again we append this part to the our original file.
And from the code above, we get the following 4x3 table:
This table can also be extended to have grouped columns, and scalars and locals in the footer.
Regression 6: Stacking standard errors
In this table we take essentially stack standard errors on top of each other from different regressions where the mean value of the parameter is the same. For example random effects or clustered models, or models with spatial adjustments to S.E.s This is also parsimonious in terms of space and is neater in terms of presentation.
For this we just run a regression but in the esttab
output modify the code to skip the bottom part of the table (as shown in the previous example), and more importantly, move the star from in front of the mean value to the standard error:
est clear
foreach x of varlist ln_wage hours wks_ue {
eststo: xtreg `x' i.black#i.collgrad, re
}esttab using "./graphs/guide80/regression6.tex", replace f ///
prehead(\begin{tabular*}{\textwidth}{@{\hskip\tabcolsep\extracolsep\fill}l*{@M}{r}} \toprule) ///
cells(b(fmt(3)) se(fmt(3) star par)) keep(1.black#1.collgrad) varlabels(1.black#1.collgrad "Black X College") ///
label booktabs noobs nonotes collabels(none) ///
star(* 0.10 ** 0.05 *** 0.01) alignment(D{.}{.}{-1}) ///
mtitles("Ln(Wages)" "Hours worked" "Weeks unemp.(t-1)")
We can now generate the second regression and note here that we just keep the S.E. add the start to it, put the value in square brackets, and append to the original file:
est clear
foreach x of varlist ln_wage hours wks_ue {
eststo: xtreg `x' i.black#i.collgrad, be
}
esttab using "./graphs/guide80/regression6.tex", append f ///
postfoot(\bottomrule \end{tabular*}) ///
cells(se(fmt(3) star par([ ]))) noobs keep(1.black#1.collgrad) varlabels(1.black#1.collgrad " ", elist(1.black#1.collgrad \bottomrule)) ///
star(* 0.10 ** 0.05 *** 0.01) alignment(D{.}{.}{-1}) ///
label booktabs nomtitle noline nonumber collabels(none) scalars("N Obs.") sfmt(%6.0fc)
And we get this table:
Note that here it is immediately easy to see which model gives what results. This type of table output was partially inspired by Melissa Dell’s Mining Mita paper. I have also used it for showing robust and spatially-robust standard errors etc.
Regression 7: Rotating tables
Here is a common table request where you want to throw everything in the regression across multiple specifications and show the result in a table which can only fit in landscape of wide format.
Let’s loop over a bunch of regressions for two specifications each: one without controls and one with controls:
global controls age* ttl_exp* tenure* not_smsa south unionest clearforeach x of varlist ln_wage hours wks_ue {
eststo: xtreg `x' $controls i.year, re
estadd local eff "RE"eststo: xtreg `x' $controls i.year, fe
estadd local eff "FE"
eststo: xtreg `x' $controls i.year, be
estadd local eff "BE"
}
We can view the output in the Stata window as follows:
esttab, keep($controls _cons) label
Note here the use of variable names and wildcards in the keep
option where L*
keeps all the lags and $controls
picks the names of the variables used in the global. We are basically throwing out the monthly dummies from the output.
Like the previous tables, we can export this table as follows:
esttab using "./graphs/guide80/regression7.tex", replace ///
b(3) se(3) star(* 0.10 ** 0.05 *** 0.01) ///
keep($controls _cons) ///
label booktabs nonotes noobs nomtitle collabels(none) ///
scalars("N Obs." "eff Controls") sfmt(0) ///
mgroups("Ln(Wages)" "Hours worked" "Weeks unemp.(t-1)", pattern(1 0 0 1 0 0 1 0 0) ///
prefix(\multicolumn{@span}{c}{) suffix(}) span erepeat(\cmidrule(lr){@span})) alignment(D{.}{.}{-1})
and in Overleaf (see document) we call the rotating
package which allows us to use the \sidewaystable
option and we get this table:
Forthcoming
And that is it for this guide! Hope you found it useful. Moving forward, I will add other templates for different regression types here as well. For example, probits and logits, IV regressions, spatial regressions etc. If you would like to suggest one, or are struggling with producing an output, just post it in the Issues section. Requested stuff definitely gets bumped up. Additionally, requests also give me an incentive to update the guide more frequently.
Miscellaneous
Please free to comment, report errors, or request additional tables. I will add more tables over time so this document will get populated with new templates. If you have a neat table code and would like it added here, then feel free to email me the code and I can add it and attribute it your name.
There are also a lot of other great resources online that cover multiple Stata-to-LaTeX packages. Here are the ones I am aware of that you can explore for more information:
Please message if you know of more resources on this topic!
About the author
I am an economist by profession and I have been using Stata since 2003. I am currently based in Vienna, Austria. You can see my profile, research, and projects on GitHub or my website. You can connect with me via Medium, Twitter, LinkedIn, or simply via email: asjadnaqvi@gmail.com. If you have questions regarding the Guide or Stata in general post them on The Code Block Discord server.
The Stata Guide releases awesome new content regularly. Subscribe, Clap, and/or Follow the guide if you like the content!