The Sound of Grammars

Singing along is highly recommended

Katie Hughes ✨
5 min readApr 6, 2018

I am currently working my way through Parsing Techniques: A Practical Guide by Ceriel J.H. Jacobs and Dick Grune which is available as a PDF on Grune’s site. As I continued in the book I found myself having a hard time following classifications and grammar levels when they were used interchangeably. I am using this blog post to hash it all out for myself so I can practice keeping the order and meaning straight by explaining these concepts to my good friend, the internet.

Let’s start at the very beginning (a very good place to start): what is the Chomsky Hierarchy? Well it’s a hierarchy of formal grammars described by Noam Chomsky. Hm, that wasn’t exactly the very beginning, was it?

What most people think of as a “normal language” (something you use to communicate with other humans, like English) is called a natural language, whereas a programming language (something you use to communicate with computers) is a formal language.

A formal grammar describes a formal language. Just like in natural language, a grammar isn’t going to tell you what a sentence means but it will tell you how to form that sentence. In our formal world, a sentence is more like a formula or a statement. In a language where we can have any amount of a and any amount of b , but every a has to always be before every b, aaaabbb would be a valid sentence.

Now that we’re back at the very beginning: what is the Chomsky Hierarchy? This is a classification of how strict a grammar is, with 0 being the least strict (but hardest to parse) and 3 being the most strict (but easiest to parse!). Before diving in I want to get some definitions out of the way for the terms that we’ll be using.

  • Rule: A rule takes the form of A -> a where A and a are both something (each step of the hierarchy has its own definition). But this is basically a production step — given this input, provide this output.
  • Symbol: This can be anything! They are values used in some sort of sequence of values. We’ll get into two types of symbols soon but for now think of them as a placeholder value and a content value. In our rule, both A and a are symbols.
  • Sentence: A sentence contains one or more symbol. A “valid” sentence is a sentence that can be produced by the given grammar and a fully produced/final sentence has used the rules provided to replace all placeholder values with real content.
  • Left-hand side: The left-hand side (you’ll see LHS sometimes but I want to try to avoid acronyms) is the A in my above example — it is the left-hand side of the rule.
  • Right-hand side: Bet you can guess what this one is. Yep, the right-hand side of the rule! You did great. That would be a in our example.
  • Non-Terminal Symbol: A non-terminal, or non-terminal symbol, is a symbol that will not be allowed in a final sentence —this is our placeholder value from before! You can also think of it as a variable. Hopefully you will be able to turn it into a terminal eventually using the rules in the grammar, otherwise it is a sentence that doesn’t exist in the language. A is our non-terminal in our rule.
  • Terminal Symbol: A terminal, or terminal symbol, is a legal symbol in the language — this is our content symbol. Like the non-terminal is our variable, the terminal symbol is the value. This symbol will no longer be processed further since it is in its final form. A terminal is usually lower case, so in our example it is a.

Type 0: Unrestricted Grammars

A grammar rule consists of a left hand side and a right hand side. Something like A -> a . An unrestricted grammar means anything can be on the left hand side and anything can be on the right hand side. Go nuts (well, as nuts as a Turing machine will let you go).

Type 1

Type 1 grammars are where things get more restrictive (and more fun!). These grammars may have rules that transform a non-zero number of symbols to possibly zero number of symbols. These come in two flavors: original and flavortown.

Monotonic

Monotonic grammars contain no rules where the left-hand side has more symbols than the right-hand side. For an example, I just try to think of nice things: FavoriteThing and FavoriteThing -> raindrops on roses and whiskers on kittens. Our left-hand side has 3 symbols and our right-hand side has 7 symbols. This grammar is Monotonic! A rule that has 16 symbols on the left hand side going on 17 symbols on the right hand side would also be monotonic.

Context-Sensitive

Context-sensitive seems like the prominent definition of Type 1 grammars. A context-sensitive grammar only consists of rules where only one symbol on the left-hand side is replaced by another symbol on the right-hand side. The note here is that the left-hand side can still be any number of symbols but the grammar must remain monotonic (i.e. number of symbols on the left-hand side is less than the number of symbols on the right-hand side.

Below we’ve added two possible outcomes for the rule FavoriteThing as well as how to parse after you have selected a FavoriteThing structure.

You’ll see | used in the upcoming grammar examples, it just means that for the given left hand side, you have options!

FavoriteThing -> Adjective Adjective Noun | Noun on NounNoun on -> raindrops on | whiskers onAdjective -> bright | copper | warm | woolenbright Adjective -> bright copperwarm Adjective -> warm woolenraindrops on Noun -> raindrops on roseswhiskers on Noun -> whiskers on kittensbright copper Noun -> bright copper kettleswarm woolen Noun -> warm woolen mittens

In this grammar we have defined, only one non-terminal (e.g. Noun) is replaced by a terminal (e.g. kittens) at a time and the left hand side is always equal to or less than the number of symbols on the right. Our example for monotonic grammars above isn’t context-sensitive because it is replacing multiple words at once. I like this example because the term “context sensitive” makes a lot of sense — in the context of raindrops on you can only finish that with roses and not kettles.

Intermission

Keep your eye out for Sound of Grammars (Reprise) where we’ll explore Type 2 and Type 3 grammars!

--

--

Katie Hughes ✨

pun-loving code witch // casual comic book reader // full time nerd 👩🏼‍💻 she/her