Visualizing Thompson’s Construction Algorithm for NFAs, step-by-step
In a previous article, I explained how to convert a regular expression to postfix notation by using the Shunting-Yard algorithm. This is the first step you must do before converting a given regular expression to an NFA. The purpose for using postfix notation is to rearrange the symbols and operations in such a way that the expression can be read from left to right, maintain the order of operations for regular expressions, and not have to handle parsing parentheses.
In the last article, we used the regular expression
a(a+b)*b , which returns all strings that begin with the letter a and end in the letter b. After we run this expression through the Shunting-Yard algorithm, the postfix result is
aab+*?b? (remember the ? symbol represents concatenation). I will now illustrate how to apply Thompson’s Construction Algorithm to convert this expression into its respective NFA.
What is Thompson’s Construction Algorithm?
Thompson’s Construction Algorithm is a method for converting regular expressions to their respective NFA diagrams. There are loads of documentation all over the internet that go more in-depth about the inner workings of why this algorithm works and its history, but you can read that on your own time. I want this article to give you a straightforward, understandable approach to this algorithm. All you need to know about Thompson’s Construction Algorithm is that there are 5 simple rules.
Rule #1: An Empty Expression
An empty expression, for example
ε, will be converted to:
Rule #2: A symbol
A symbol, such as
b, we will converted to:
Rule #3: Union expression
A union expression
a+b will be converted to:
Rule #4: Concatenation expression
A concatenation expression
a?b will be converted to:
Rule #5: A closure/kleene star expression
a* will be converted to:
Using Thompson’s Rules for our Regular Expression
Now that we know the rules of Thompson’s Construction Algorithm, let’s convert our regular expression
aab+*?b? into its respective NFA diagram.
The first character we encounter is an
a is a symbol. So, we follow Rule #2 to produce the following NFA:
We will call this diagram NFA 1.
In order to store our NFAs to reference later on in the algorithm, we will be using a stack. Let’s push NFA 1 onto this stack now.
The next character is also an
Follow Rule #2…
We will call this diagram NFA 2 and push it onto the stack.
The next character is a
Follow Rule #2…
We will call this diagram NFA 3 and push it onto the stack.
Heads up! The next symbol in the expression is a
This is the symbol for a union expression. That means we have to follow Rule #3. You may be asking yourself, what two NFA graphs would we use for the red (NFA1) and blue (NFA2) diagrams?
Well, this is where the stack machine comes into play. Rule #3 is ultimately asking for some NFA X to union with some NFA Y. So, we are going to pop the two NFAs that are on top of the stack and union them together like so:
Note that because NFA 2 was pushed on to the stack BEFORE NFA 3, we must follow the format NFA 2 + NFA 3 and not NFA 3 + NFA 2.
Now we can union the two NFAs together. Remember that we have to declassify NFA 1 and NFA 2’s start state and accept state’s because we now have a new start state and accept state for this union-ed NFA.
We will call this diagram NFA 4 and push it onto the stack.
The next symbol in the expression is a Kleene star closure
Rule #5 shows us how to handle closures. A closure only refers to a single NFA diagram. So we will only pop the top-most NFA off of the stack, which is NFA 4.
Following Rule #5, we can create a closure for NFA 4. Remember that we have to declassify NFA 4’s start state and accept state because we have a new start state and accept state from Rule #5. Our new NFA will be…
We will call this diagram NFA 5 and push it onto the stack.
The next symbol in the expression is a concatenation
This will require Rule #4 to compute. Similar to the union expression, a concatenation operation will need two NFAs to combine and will follow the same pop order (top-most NFA on the stack goes on the right, and the following NFA goes on the left).
Now we can follow Rule #4 to concat NFA 1 and NFA 5.
We will call this diagram NFA 6 and push it onto the stack.
The next symbol is another
Just follow Rule #2…
We will call this diagram NFA 7 and push it onto the stack.
The last symbol is another concatenation ?.
Once again, we have to follow Rule #4. First, we must pop the NFAs off of the stack…
Now we can concat these NFAs together.
We will call this diagram NFA 8 and push it onto the stack.
At this point, our program will realize that we have no more characters to evaluate in our expression. So, the last thing we have to do is pop the only NFA that is in the stack and deem that as our final NFA.
Our final NFA for the expression
There you have it. We have successfully converted a regular expression to postfix notation with the Shunting-Yard Algorithm, and then build its respective NFA diagram with Thompson’s Construction Algorithm.
I hope this step-by-step illustration gave you a better understanding of how we can turn a simple regular expression into a nondeterministic finite automata.
Thanks for reading!