Regex, State Machines, and Automata — Basics and Theory
What an expert programmer should know about regex internals like DFA & NFA state machines
Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. — Jamie Zawinski
Like most programmers, I’ve been using simple regular expressions/pattern matching like “/^Foo(bar).*/” for years, but I admit when I need a complex one like a basic email checker:
/^([a-zA-Z0–9._%-]+@[a-zA-Z0–9.-]+\.[a-zA-Z]{2,6})*$/
- I often hacked my way to finding the right regex with Google’s help
- I had a very shallow understanding of how it really works
This article explores the fundamentals of regex that solid engineers must know, but also dives into the less known details of how they are implemented using state machines (something only top engineers care to know!)
Primer on State Machines
Without getting deep into computing theory, we need to understand what a state machine is or formally a Deterministic and Non-Deterministic Finite State Automata (DFA and NDFA/NFA). When matching…