Vyaakaran — Visualize the world of automata and parsers!
--
Vyaakaran is a browser-based tool to visualize grammars in the form of automata, parse tables, and other concepts found in formal language theory. I started working on Vyaakaran as a way to better learn compilers and language writing and also build a useful project along the way for others to learn the same.
In this article, I’ll be going over how the core of Vyaakaran works. I find this as one of the best fields in computer science and by the end of this article, I hope you think the same 🙂
Understanding the layers in Vyaakaran
The core of Vyaakaran does a lot of things to convert your grammar to respective automata and parse tables. The entire process is broken down into a bunch of layers that are responsible for performing a specific operation.
- Lexer — Breaks down the input grammar to a bunch of individual tokens.
- Parser — Uses the tokens from Lexer to validate the syntax of the input grammar and generates a parse tree out of the input grammar.
- Analyzer — Analyzes the parse tree to find and perform various checks on the grammar. This may include finding unreachable or undefined symbols, etc.
- Converters — Specific algorithms that convert the parse tree to required outputs (eg. Finite automata, parse tables, regular expressions, etc.)
This looks similar to the components taught in the Compiler Design CS courses. And yes, it is! Remember I had mentioned that I was learning about compilers when I started working on Vyaakaran. So most of the higher-level architecture is similar to that taught in my course. Let’s understand each of the above layers in detail in the further sections.
1. Lexer
Lexer’s job is to simply break down the input program into a bunch of small indivisible sets of characters. I think it’s best explained by the following diagram.
The input in this image was a single line of grammar production. The lexer takes the input and breaks it into an array of individual tokens. These tokens are not always just a single character. A token can also be a group of characters. ->
has two characters but is a single token.