The awesome Stata Tips collection!

Asjad Naqvi
The Stata Guide
Published in
20 min readJan 11, 2022

--

Stata is a fairly extensive software that has been out there for a while now. And there are plenty of features, shortcuts, and commands hiding in plain sight that can significantly boost user experience and provided much needed quality-of-life improvements.

Photo by Sam Dan Truong on Unsplash

In 2020 and 2021, I was regularly posting Stata tips on Twitter. Sometimes others joined in as well. Therefore, in the past years a lot of great tips have been tweeted. In case you missed all the action, this article provides a collection of these tips, together with some additional explanations. It also helps you find code snippets!

This post does not cover everything, but it does contain some really interesting stuff, for example, how to tab graphs, auto-scaling of the graph axes, adding text boxes to figures, dealing with conditionals, using locals to your advantage, shortcuts to data management, and a lot more!

This is a working document and will be updated once in a while with new awesome tips. Some of the tips (maybe one or two currently) might only work on newer versions (Stata 16 or higher) so do check for compatibility issues if you are using older versions.

In order to group the tips, they have been split up in five broad (sometimes overlapping) sections:

  1. Stata functionality: This is all about customizing the Stata interface.
  2. Locals and in-built commands: Unleash the power of locals. And start using already available commands to your advantage.
  3. Dofiles: To dofile editor can do a lot more than you know!
  4. Data management: Some tips to make your data handling part less painful.
  5. Graphs: Small tweaks equal better graphs.

So let’s get started!

Section 1: Stata functionality

Stata itself can be customized quite a bit beyond the default settings. The tips in this section deal with the user interface and software customization.

Tip: Dark mode

Since I post a lot of screenshots on Twitter, I get asked about the Stata dark mode a lot. Here is my setup:

You can right click on the center screen, click properties and select “Dark” as the overall color scheme. I have also moved the variables window to the left pane to maximize the view of variables. The History pane is on top right right and I don’t really need it (why? next tip!). Stata states your working directory in the bottom left, and also displays whether Capslock, Numlock, or Insert are activated on the bottom right.

Also note that my main Results viewer is NOT the Dark mode. It is actually the “Classic” color scheme for higher contrast. You can just click on the “Results” menu, which is right below the “General” menu and change it to Classic.

So now you know how to customize your Stata interface.

Dark mode might take some time to get used to but it is easier on the eye and also reduces the blue light you are exposed to.

Tip: History

If you are typing a lot of code and don’t really work with the mouse so much, then you can scroll through previous commands (that are also there in the history pane), using the PageUp and PageDown keys. I use these a lot on Windows. Sometimes these are not so obvious to find on laptops!

Tip: Niceness (memory priority)

You can tell Stata to be nice or not in terms of memory usage:

set niceness 5 // the default 

where 0 is being a jerk and 10 is very nice (according to the documentation).

Niceness deals with how much you let Stata hog the memory. This is useful for operations on very computationally intensive operations like bootstrapping or multiple imputations. But the default option is also great since it allows me to open a second Stata instance and working on something else if some code is running for a while. How do you do this? Next tip!

Tip: Running multiple Stata sessions

You can run multiple sessions of most softwares including Stata on Windows (Mac users please drop a comment if you are aware of something similar).

Just right click on the Stata icon, and click on the Stata icon.

And a second Stata will pop up. You can open a third or fourth instance as well. Now you can practice dealing with multiple dofile editors and data windows. But works great on dual screens!

Tip: Graph tabs

If you are making multiple graphs and want to keep them on the screen, name them. This will ensure that they all stay open. Without names, the newer replaces the older. If you don’t like multiple windows, tab them using

set autotabgraphs onset autotabgraphs on, perm  // for permanent changes

You can try some code as well:

sysuse auto, clearset autotabgraphs onscatter price    mpg, name(graph1, replace)
scatter price length, name(graph2, replace)
set autotabgraphs offscatter price mpg, name(graph1, replace)
scatter price length, name(graph2, replace)

If autotabgraphs are enabled, then everything is tabbed like in the image on the left, otherwise, every graph that you give a name, will open in a different window.

Tip: Undocumented and redundant commands

Stata has a lot of undocumented commands that contain some hidden gems. You can view these by typing

help undocumented

The biggest ones, that never made it as official commands, are the margins options.

There is also an archive of previously documented commands that have removed or achieved from Stata (for various reasons):

help prdocumented

Some interesting stuff in there!

Tip: data signature

If you are working on a data file on some shared drive with multiple people, like co-authors, RAs, etc., you can use Stata’s built in data signature option:

help datasig

to see if something has changed. It helps safeguard against tampering. Any change in the variables will change the data signature.

Tip: language swap

The ability to swap languages is a little known but a very powerful feature especially if you have to present the same thing in different languages (for example, English, German, Spanish, etc.). You can easily swap variable and value labels without having to do manual editing of datasets:

help label language

It might be some work to set it all up, but depending on the case scenario, the returns on investment might be worth it, especially for big projects. See also the next tip!

Tip: swap decimals and commas

In some parts of the world, the decimals and commas are swapped. For example, in the German-speaking regions, a number 123,456.78 would be written as 123.456,78. This can cause some headaches, but if you are presenting to different audiences then see:

help set dp

and easily swap decimals and commas.

Tip: add notes to your datasets

Stata allows us to store notes on each data set and even variables:

help notes

Notes for example, can include meta information on the data itself, who created it, plus other additional details. Try: sysuse auto or sysuse nlsw88 and type notes.

When I was starting to use Stata on large projects (around 2005–06), adding notes to datasets was a requirement. I suspect that it has now fallen out of fashion. But if you are finalizing files for uploading on a server or with a paper for replication, do add meta information about the data using notes.

Tip: ado files

It is common to install and collect custom ado programs from SSC or otherwise. Don’t forget to check for updates:

ado update

Some developers are very active and new versions come out fairly frequently. They also correct bugs or add functionality. Plus you might have multiple copies of the same program installed. Do run this command every few weeks or so!

Tip: describe data without loading

You can run the command describe and read all the variable names and labels 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 having to load the dataset into memory! For example:

describe using https://stata-press.com/data/r17/even, varlist

I also tested it on an 8GB Stata file, which you see has 668 variables and almost 13 million rows.

Loading this file would have been a pain. If needed I can load a subset of the file or split it up across Stata frames.

Tip: Stata datasets

Want a quick way of finding Stata datasets? Just type:

sysuse dir

and see the list of all the datasets that come with your Stata copy

Also ensures that you don”t just rely on just auto.dta like me ;)

Tip: sample helpfile

If you programming a new command and don’t know how to set up the helpfile then see:

help examplehelpfile

It will also point you to a bunch of related documentation.

Tip: Finding your system information

Where are all the files stored? This command can help you find the paths:

sysdir

Mostly used by programmers or people searching for specific programs. For example if you have downloaded a user-written program, it will be in the “PLUS” folder and you can find out where that it from the above command.

Tip: Timers

Sometimes, we want to see how long a command, or a bunch commands take to run. Here on can use the in-built timer option:

webuse highschool, clear

timer clear
timer on 1
svy: regress weight height
timer off 1


timer on 2
svy: regress weight height sex
timer off 2

timer list

One can start and stop different timers. One can also run multiple commands in one timer. See help timer for more details. There is a related rmsg option (see help rmsg) where one can permanently turn timer options on and off!

Section 2: Locals and in-built commands

Understanding and mastering locals is essential to improving your programming skills. It also makes your code compact and neat. Here are some tips for day-to-day issues:

Tip: Variable labels as locals

A lot of times we want to generate some graphs and label them. Rather than having a complicated variable name, or a manually defined title, we can simply read variable labels, store them in locals, and pass them on to graphs for labeling or other operations:

foreach x of varlist xx-yy {  
local v : var label `x'
twoway line yvar xvar, title("`v'")
}

If you are using a neatly formatted dataset, and want to loop over a bunch of variables, this tip can save you tons of time.

Tip: Value labels as locals

Just like the above tip, we can also read value labels and store them as locals:

lab de varlab  1 "x" 2 "y"... 
lab val varname varlab
levelsof varname, local(lclname)foreach x of local lclname {
local t : label varlab `x'
twoway line yvar xvar if varname==`x', title("`t'")
}

Here the local actually points to the label name, which is varlab in our case, and the label number. Remember, here the foreach x is looped over values of the variable varname.

Tip: Items/Categories/Groups in a variable

Continuing from above, levelsof is an extremely powerful local option. Not only does it allow us to store the levels or categories of each variable, it also provides information on how many they are:

levelsof country, local(lvls)
local items = `r(r)'

I have used the item count in many of my guides since the count of unique items might vary across different groups or time periods. I would also suggest have a look at return list after running a levelsof command to see all the options available.

Tip : Formatting locals

Once a local is defined, it can be formatted by changing its display. In the example below, we summarize the price variable, multiply is by top, change its display, and then store it in a local again:

sysuse auto, clear
summ price
local x = `r(mean)' * 10
local x : di %10.2fc `x'
display "`x'"

This is an extremely power option for auto formatting of labels in graphs and tables without modifying the format of the original variable.

Tip: Formatting dates for displays

Continuing from above, dates are a pain to deal with in Stata since they are stored in a local Stata format. The commands below shows three different date formatting options:

// the data nowlocal date: display %tdd_m_yy date(c(current_date), "DMY")
display "`date'"
// from a data variable
summ date
local date: display %tdd_m_yy `r(max)'
display "`date'"
// the same date local formatted
local date2 = subinstr(trim("`date'"), " ", "_", .)
display "`date2'"

The first option shows the date now that is given the %tdd_m_yy format. The next shows some local data variable generated based on some condition that is formatted. This is something I also use extensively in graphs for auto labeling dates, especially if different groups have different ending dates in the datasets. For example, daily COVID-19 cases etc.

The last option, shows that we can actually format the date local using string operations. In this case, the spaces in the date local are replaced with underscores. Also try these options:

local date = string(date(c(current_date), "DMY"), "%tdd!_m!_Y")
display "`date'"
local date = string(date(c(current_date), "DMY"), "%tdCCYYNNDD")
display "`date'"
local date = string(date(c(current_date), "DMY"), "%tdCYND")
display "`date'"

We can use these locals in exporting graphs, filenames etc., where spaces in names are frowned upon.

Tip: Resetting locals

Mostly a programming option. But let’s assume you are running a bunch of commands where a local is defined again:

// some running counterforeach x of varlist aa-gg {summ `x'
local mylocal = `mylocal' + `r(mean)'
}// second block
foreach x of varlist hh-zz {
summ `x'
local mylocal = `mylocal' + `r(mean)'
}

In the above example, we actually wanted to start from a clean local for the second code block. But since we already defined it above, the value will be passed onto the next code block.

Here we can reset the local by simply inserting local <localname>:

local mylocal

in between the first and the second code block and this should do the trick. Otherwise, you can also use unique names for different locals.

Tip: r(tables)

Newer versions of Stata store results in r(table) locals, that can be recovered after running regression commands:

sysuse auto, clear
reg price mpg weight length

ret li
mat li r(table)


di r(table)[1,1]

We can see that the r(table) exists in the return list as a matrix. Therefore, we can also access its elements using Stata’s matrix operations.

Tip: rename groups

The rename command has improved considerably over the Stata versions. A powerful option is to rename groups of variables all at once with have to loop of lists:


// add an underscore to all variables
sysuse auto, clear
ren * *_

// add an underscore to variables end with e
sysuse auto, clear
ren *e *e_

See help rename for more customization options!

Tip: variable lists

Extremely few know about this: In v17, there is a built-in command vl to build variable lists quickly. Try this out:

sysuse auto, clear

vl set

It will auto generate lists of categorical and continuous variables. Want more info? Try vl list. See help vl.

Section 3: Dofiles

My of our lives are spent in front of the do file editor, and yet this window still contains so many secrets. Some of the tips below might only apply to newer Stata versions (16 or 17+)!

Tip: sectioning code

You can collapse code blocks in your dofiles using curly brackets:

{
code block
}

This allows you to hide large code chunks. Very useful for routines that are finalized, or just need to be run once.

Tip: quietly

Following from the above tip, you can also suppress outputs from code blocks using quietly with curly brackets:

qui { 
<code block here>
}

It is really useful for programming or if you are running a large dofile and want to hide some sections.

Tip: Side-by-side stacking of dofiles

Here is a little-known tick. You can stack dofiles horizontally or vertically in the editor. Just drag a dofile in the pane to the right on the horizontal line and the options will pop up.

This is really useful for comparing file versions. Or just being able to open multiple dofiles without having to jump through tabs. A note: Each “block” allows you to open multiple dofiles. So you can actually have different sets open.

I personally use this for double checking Stata code on GitHub with code on my computer especially after pull requests.

Tip: Indentation in dofiles

If you are using a lot of for, while, if, else, conditions and loops, or using a lot of Mata, enable indentation to help you format your code neatly. Each code block should be clearly indented so you know which layer you are working on.

Regardless, indentation is good practice and adding indentation guides also helps you clean up your dofiles. So please start indenting! Everything left aligned is actually very difficult to read.

Section 4: Data management

Data management is probably 80% of time spent in research. And there are tons of tricks. Below are some of my favorite ones:

Tip: finding variables

If you have hundreds of variables you can:

(a) search for keywords in names & labels using lookfor:

lookfor gender 

(b) search by variable types using ds:

ds, has(type numeric) 

will give you all the numeric variables. You can also specify the “not” option:

ds, not(type string)

which gives you the same result as the previous command. Check out ds (help ds). It is an extremely powerful option for searching for variables using patterns.

Tip: Variable names in first row

Sometimes you import excel/csv files where the header info ends up appearing in the first row. This code can help you:

foreach x of varlist _all { 
local header = `x'[1]
ren `x' `header'
}
drop in 1destring _all, replace

Just a side note, if you use the newer import instead of insheet (no longer maintained by Stata), then you also have the option to specify which row contains the variable names. But the above case might still come in handy for those awkward files.

Tip: Reading all files in directory

This tip is extremely handy especially if you are parsing tons of files in some directory, and especially if these files have irregular names (not following some pattern). You can just find everything in a folder as follows:

local x: dir . files "*"

or you can make your search more specific:

local x: dir . files "*.csv"

You can also see what you got by simply typing:

display "`x'"

This is extremely useful for batch processing files. For example, I have used it a lot on emissions data that usually comes in grid cells. Time hundreds of grids multiplied by emissions types.

Tip: Post-estimation results

Probably not very relevant now but a couple of years ago Stata quietly added r(table) after estimation commands. This essentially stores coefficients, standard errors, and confidence intervals after regressions. Just try some regression on some data, e.g.:

sysuse auto, clear
regress price mpg

and then type:

return list

and you will see r(table) there. Since it is a matrix, you can display it as follows:

mat list r(table)

You can also recover matrix elements based on your requirements. For example, standard errors are much easier to recover from r(table) as opposed to the variance-covariance matrix.

Tip: conditions

If you are generating variables based on a lot of conditions, then rather than specifying a huge list (x==1 | x==2 | x==5 etc.) you can use:

gen y = 1 if inlist(x,1,2,5) // for specific values

For continuous variables, the following do the same thing:

sysuse autogen v1 = mpg > 20gen v2 = !inrange(mpg, 0, 20)gen v3 = cond(mpg > 20, 1, 0)recode mpg (0/20 = 0) (21/. = 1), gen(v4)gen v5 = irecode(mpg, 0, 20, .)

They also do a lot more! Check their help files to help you improve your conditionals. For example it is easier to write:

gen v6 = inrange(mpg,1,10)

than:

gen v7 = mpg >=1 & mpg <= 10

Tip: reshape

Reshape is the bane of many researchers. BUT, once you manage to run the reshape command, you can simply swap back and forth between long and wide formats by simply typing reshape long and reshape wide.

Also if u get stuck use reshape error to sort it out. Otherwise keep practicing!

Tip: inspect for a quick summary

If you want to check a variable fast then use inspect (help inspect) which is similar to summarize but gives a neat histogram in the results window. Try this:

sysuse auto 
summ weight
inspect weight

It also helps give a quick overview of the completeness of the variable.

Tip: value labels

Probably this is nothing amazing but you can use the modify or replace options to prevent a code chunk from halting:

lab de mylab 1 "Option 1" 2 "Option 2" 3 "New option", replace

You can all labels using label list. Drop them using label drop.

AND you can use the undocumented _strip_labels to strip a variable of its label.

Tip: A sample panel dataset

A Stata tip for generating panels (since I see a lot of people do it crudely and inefficiently). Here is an efficient method which uses egen seq option. A powerful tool for generating all sorts of sequences:

clear// define the panel variableslocal units = 40   // panel variable
local start = 2000 // time start
local end = 2022 // time end

local time = `end' - `start' + 1
local obsv = `units' * `time'
set obs `obsv'
egen id = seq(), b(`time')
egen t = seq(), f(`start') t(`end')

and this will give you a neat balanced panel without having to do complicated jugglery.

Tip: Run dofiles from dofiles

Probably also not the most exciting tip but its all about workflow management! You can run dofiles from dofiles. This allows you to organize and split dofiles by their purpose: setup, clean, merge, make graphs etc.

So do split your code across different dofiles. Check the Stata workflow guide for more details.

Tip: visualizing Mata matrices:

You can visualize Mata matrices using Ben Jann’s heatplot (ssc install heatplot). Try this:

mata A = runiform(10,10)
heatplot mata(A)

It can be very handy especially if you want to see what the variance-covariance matrix or spatial error terms look like.

Tip: clear vs clear all

The option clear is different from clear all or clear *. While most people use clear to clear the dataset and labels, clear all wipes the slate clean. Use the latter if you are working matrices, mata, frames, programs, etc.

Tip: gtools

The gtools suite is incredibly fast and you can use it to reshape, collapse very large datasets very very fast.

Install it:

ssc install gtools, replace

and check it out.

Tip: spatial data

Stata has a basic import and export feature for shapefiles: see help import shp. One can calculate a bunch of new variables and export them to shapefiles for use with GIS softwares like QGIS/ArcGIS. This came out in the early development days of map features in Stata.

Tip: Adding dots to show progress

This is tip purely for programmers, but Stata has an option for showing dots to show progress during some estimation command:

clear 
set obs 100
gen obs = _n
gen x = runiform(0,1) levelsof obs, local(lvls)foreach i of local lvls {

qui summ x if obs <= `i'

if `r(mean)' < 0.5 {
local dot 0
}
else {
local dot 1
}
_dots `i' `dot'
}

This at least allows some progress to be shown for those times when you are just starting at the screen and not knowing when the program will end.

Tip: white spaces (regex)

A regular expressions tip: This code gets rid of all the extra white space/tabs in string variables to neatly single space your sentences:

x2= trim(ustrregexra(x,"/(\r\n\t)+|\r+|\n+|\t+/", ""))

And here is another version that also gets rid of the tabs delimiters in your badly formatted string variables:

gen x3 = ustrregexra(x2,"[ \t]+|[ \t]+", " ")

Section 5: Graphs

Graphs are usually finalized in the last step for papers and reports. Again, a lot of tips can be provided here but some of my favorite ones are the following:

Tip: Auto scale axes

You can auto-scale axes ranges using the minimum and maximum values. For example, if you are updating graphs periodically, for example COVID-19 time series information, then axis ranges can be automated as follows:

summ date
local x1 = `r(min)'
local x2 = `r(max)'
xlabel(`x1'(10)`x2')

You can also add buffers to end points, for example:

summ date
local x1 = `r(min)'
local x2 = `r(max)' + 30
xlabel(`x1'(10)`x2')

In case you also want to throw in some labels to make the graph feel less cramped.

Tip: Graph ticks

Sometimes you need to increase the number of ticks on the axes. Rather than defining some custom ranges, you can just say how many ticks you want, for example:

sysuse auto, clear twoway (scatter price mpg), xlabel(#20) ylabel(#20)

and that’s it!

Tip: Legend order

You can control what is shown in the legend including the order. For example:

legend(order(5 "var5" 3 "var3" 1 "var1")) 

will show 5th, 3rd, & 1st elements in the legend in that order. Everything else will be taken out from the legend. Really powerful option if you want to show some values first.

Tip: text boxes in graphs

Very few people know about this, but you can add text boxes to graphs and format them to show whatever:

twoway (scatter mpg weight), text(17 4200 "Did you know that you can add text boxes" "inside graphs for some additional info?", size(small) box just(left) margin(l+2 t+2 b+2) fcolor(gs14%80) lw(none))

This can also be used to automate a lot of reporting directly on the figure.

Tip: Bold makers

Here is how we can make markers bold:

clear

sysuse auto


cap drop rand
gen rand = runiform() > 0.8

cap drop make2
gen make2 = "{bf:" + make + "}" if rand==1

twoway ///
(scatter price mpg, mlab(make2) ) ///
, title("my {bf:bold title} with some {it:italic text}") legend(off)

This option is really helpful for highlighting certain categories.

Tip: Split labels across lines

This tip I accidentally discovered while looking for line break options for regular expressions. By default, Stata does not allow marker labels to be split across lines. One can get a bit create and label the makers twice with different clock positions but this is also not optimal and memory intensive. So here is an even more creative option, we inset an ASCII line break character char(10) where we want to labels to split. In the example below, we just do it where a blank space exists:

sysuse auto, clear
set seed 400
gen sample = runiform() < 0.2
scatter price weight if sample, mlab(make)


gen make2 = make

replace make2 = subinstr(make2," ", "`=char(10) " , .)
replace make2 = subinstr(make2," ", "'" , .)

scatter price weight if sample, mlab(make2)

and we have line breaks in marker labels!!

And that’s it for this guide! Know of more handy tips? Post them on Twitter! Or let me know here in the comments. As mentioned earlier, I will update this guide periodically to keep adding more cool tips.

About the author

I am an economist by profession and I have been using Stata since 2003. I am currently based in Vienna, Austria where I work at the Vienna University of Economics and Business (WU) and the International Institute for Applied Systems Analysis (IIASA). You can see my profile, research, and projects on GitHub or my website. You can connect with me via Medium, Twitter, LinkedIn, or simply via email: asjadnaqvi@gmail.com. If you have questions regarding the Guide or Stata in general post them on The Code Block Discord server. If don’t have time to make all these visuals in Stata, contact me on UpWork and we can figure (no pun intended!) something out.

The Stata Guide releases awesome new content regularly. Subscribe, Clap, and/or Follow the guide if you like the content!

--

--

Asjad Naqvi
The Stata Guide

Here you will find stuff on Stata, data visualizations, data wrangling, workflows, and programming.