The Stata Frames Guide

Asjad Naqvi
The Stata Guide
Published in
17 min readJan 4, 2022

--

(last updated: 16 Sept 2023)

The ability to use frames in Stata is arguably one of the most important features added to the software. Introduced in version 16, frames essentially allow us to hold multiple datasets in memory, and work across them. This has considerable advantages over using single, very large datasets, or moving in and out of datasets using the preserve and restore options. For example, assume we have data on students (100k observations), teachers (10k obs), and schools (1k obs). If we merge all this data together (a one-to-many merge), then we will end up multiplying the observations in the teachers dataset by 10, and the schools by a 100. This also implies that each variable in the teachers and schools dataset is repeated for each corresponding student row. This creates an unnecessarily large dataset, that is a burden on the memory that can considerably slow down calculations. Frames allows us to keep the datasets in their original form, and we can create “links” between them either as a 1:1 or a m:1 merge (1:m is not currently supported). For very large datasets, we are talking about immense memory savings and efficiency gains.

The frames feature is not unique to Stata. It has long existed in other languages including R and SQL. But being a relatively new feature in Stata, few use it or understand its importance. In short, frames are really worth the…

--

--

Asjad Naqvi
The Stata Guide

Here you will find stuff on Stata, data visualizations, data wrangling, workflows, and programming.