Regex, State Machines, and Automata — Basics and Theory

What an expert programmer should know about regex internals like DFA & NFA state machines

Doug Foo
CodeX

--

Kleene Star Closure courtesy of Wiki Commons

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. — Jamie Zawinski

Like most programmers, I’ve been using simple regular expressions/pattern matching like “/^Foo(bar).*/” for years, but I admit when I need a complex one like a basic email checker:

/^([a-zA-Z0–9._%-]+@[a-zA-Z0–9.-]+\.[a-zA-Z]{2,6})*$/

  1. I often hacked my way to finding the right regex with Google’s help
  2. I had a very shallow understanding of how it really works

This article explores the fundamentals of regex that solid engineers must know, but also dives into the less known details of how they are implemented using state machines (something only top engineers care to know!)

Primer on State Machines

Without getting deep into computing theory, we need to understand what a state machine is or formally a Deterministic and Non-Deterministic Finite State Automata (DFA and NDFA/NFA). When matching…

--

--

Doug Foo
CodeX
Writer for

Tech Manager by Day, ML Hacker by Night — founder: foostack.ai