Regular expressions (regex) in Stata

Asjad Naqvi
The Stata Guide
Published in
23 min readMar 17, 2021

--

(Last updated: Feb 2024)

This guide covers one of the most under-documented features of Stata: regular expressions, or regex for short. In this guide we will learn how to implement the regex features shown in the Stata cheat sheet below. This includes learning about quantifiers, building bottom-up specific and generic expressions, using word boundaries, and choosing between greedy versus possessive matching.

Printable version: https://www.etsy.com/shop/ReviseResubmit

Regex is the core pattern-matching algorithm used for text searches. Implemention of regex is ubiquitous on the internet where the algorithm is invoked when doing autofills, and password checks. For example, regex controls whether your password has sufficient characters or whether it is strong enough without actually storing or seeing the password. Regex is extremely powerful, and can be incorporated in several tools such as text mining, natural language processing (NLP), sentiment analysis, machine learning (ML), automated journalism, auto completing text, and programming web crawlers. Companies like Google already probably use some version of regex to sift through your emails to find keywords for targeted advertising.

--

--

The Stata Guide
The Stata Guide

Published in The Stata Guide

Here we showcase a series of articles on the Stata software covering topics ranging from data science to statistical analysis.

Asjad Naqvi
Asjad Naqvi

Written by Asjad Naqvi

Here you will find stuff on Stata, data visualizations, data wrangling, workflows, and programming.

Responses (5)