Context Free Grammar and its Applications

Vidhisha
7 min readJun 8, 2022

--

What is Grammar?

Grammar − It is a set of rules which checks whether a string belongs to a particular language or not.

A program consists of various strings of characters. But, every string is not a proper or meaningful string. So, to identify valid strings in a language, some rules should be specified to check whether the string is valid or not. These rules are nothing but make Grammar.

What is Context Free Grammar (CFG)?

It is a notation used to specify the syntax of the language. Context-free Grammar is used to design parsers.

Lexical Analyzer generates a string of tokens which are given to the parser to construct a parse tree. But, before constructing the parse tree, these tokens will be grouped so that the results of grouping will be a valid construct of a language. So, to specify constructs of language, a suitable notation is used, which will be precise & easy to understand. This notation is Context-Free Grammar.

Basic terms used in CFG:

Terminals: These are the characters that make up the actual content of the final sentence. These can include words or letters depending on which of these is used as the basic building block of a sentence.

Non Terminals: These are also called variables. These act as a sub language within the language defined by the grammar. Non terminals are placeholders for the terminals. We can use non terminals to generate different patterns of terminal symbols.

Start Symbol: a start symbol is a special non terminal that represents the initial string that will be generated by the grammar.

Context Free Grammar is defined as a 4 tuple G = (V, Σ, R, S) where:

  1. V is a finite set of elements known as variables.
  2. Σ is a finite set of elements known as terminals
  3. V ∩ Σ = Null (empty set)
  4. S is an element of V and is known as the start variable.
  5. R is a fine set of elements known as rule. Each rule is like A -> w where A belongs to V and w belongs to union of (V and Σ)*. The * denotes repetition of elements.

An example of Context Free Grammar is:

G = (V, Σ, R, S) where:

  • V = {S}
  • Σ = {(, )}
  • S = S
  • R = { S -> e, S -> (S), S -> SS }

The above Context Free Grammar is for Balanced Parentheses expressions consisting of round bracket only ().

Definitions of Context Free Grammars:

  • If A is an element in V and u, y and w are strings in (V union Σ)* and there is a rule A -> w in R, then the string uwv can be derived in one step from string uAv and can be written as: uAv => uwv.
  • If G is a Context Free Grammar and u and v are strings in (V union Σ), then v can be derived from u and write this as u => v if one of the following two conditions hold true: 1. u = v. 2. For K >= 2 and a sequence u1, u2, …, uK of strings in (V union Σ)* such that:
  • u = u1
  • v = uK
  • u1 => u2 => … => uK
  • G is a Context Free Grammar. The language of G is defined to be the set of all strings in Σ* that can be derived for start variable S in V:

L(G) = { w belongs to Σ* : S => w}

  • A language L is called Context Free Language if there is a Context Free Grammar G such that L(G) = L.

Example of Context Free Grammar

This is the example of a Context free grammar (CFG) for Balanced Parentheses:

  • V = {S}
  • Σ = {(, )}
  • S = S
  • R = { S -> e, S -> (S), S -> SS }

Let us assume we want to arrive at a balanced expression (())()() using our context free grammar G.

The steps are:

S => SS

=> (S)S

=> (S)SS

=> (SS)SS

=> ((S)S)SS

=> (()S)SS

=> (())SS

=> (())(S)S

=> (())()S

=> (())()(S)

=> (())()()

There are numerous aspects of typical programming languages that behave like balanced parentheses. Beginning and ending of code blocks, such as begin and end in Pascal, or the curly braces { . . . } of C, are examples. There is a related pattern that appears occasionally, where ”parentheses” can be balanced with the exception that there can be unbalanced left parentheses. An example is the treatment of if and else in C. An if-clause can appear unbalanced by any else-clause, or it may be balanced by a matching else-clause. A grammar that generates the possible sequence of if and else (represented by i and e, respectively) is:

S → SS | iS | iSe | λ

For instance, ieie, iie, and iei are possible sequences of if and else’s and each of these strings is generated by the above grammar. Some examples of illegal sequences not generated by the grammar are, ei, ieeii, iee.

Properties of Context Free Grammar

  • All Regular Languages along with some Non-regular Languages are Context Free Languages. Context Free Language is a language based on Context Free Grammar. Therefore, all Regular Languages = Context Free Languages. Some / not all Non-regular Languages = Context Free Languages.
  • If a Language L is not a Context Free Language, then the language L is a Non-regular Language.
  • Σ is an alphabet and L is a proper subset of Σ*. If L is a Regular Language, then L is a Context Free Language. A Context Free Grammar G can be converted to another Context Free Grammar G’ such that L(G) = L(G’) and the rules of G’ are in restricted form that is Chomsky Normal Formal.
  • Σ is an alphabet and L is a proper subset of Σ* and L is a Context Free Language. Then, there exists a Context Free Grammar G in Chomsky Normal Formal whose language is L = L(G).

Applications of Context Free Grammar (CFG)

  • Context Free Grammars are used in Compilers (like GCC) for parsing. In this step, it takes a program (a set of strings).
  • Context Free Grammars are used to define the High Level Structure of a Programming Language.
  • Every Context Free Grammar can be converted to a Parser which is a component of a Compiler that identifies the structure of a Program and converts the Program into a Tree.
  • Document Type Definition in XML is a Context Free Grammar which describes the HTML tags and the rules to use the tags in a nested fashion.

Following is the Context Free Grammar for HTML (with limited tags):

  • Char -? a | A | . . .
  • Text -> λ | Char Text
  • Doc -> λ | Element Doc
  • Element -> Text | < EM > Doc < /EM >|< P > Doc |< OL > List < /OL >
  • List -> λ | ListItem List
  • ListItem -> < li > Doc
  • All finite sets of strings are Regular Languages. All Regular Language is a Context Free Language. Hence, all Programming Languages are Context Free Languages / can be represented by a Context Free Grammar.
  • Algebraic Expressions can be represented using Context Free Grammar. For example, these are the rules of a Context Free Grammar for syntactically correct Infix expression using 3 variables (x, y, z):
  • S -> empty
  • S -> (S)
  • S -> x
  • S -> y
  • S -> z
  • S -> S + S
  • S -> S — S
  • S -> S * S
  • S -> S / S

There are syntax features that cannot be represented with a Context Free Grammar such as:

  • Indentation
  • Whitespace
  • Typedef in C and C++ Programming Languages
  • Macro and Templates in Programming Languages like Lisp, C++, Haskell, Nim and others.

Therefore, Programming Languages / Real life programs are not represented purely by Context Free Grammar.

  • Complete Sentences in English Language can be generated using Context Free Grammar. For example:

G is a Context Free Grammar

V = {S, < Noun phrase >, < Verb phrase >, < Adjective phrase >,

< Noun >, < Verb >, < Adjective >}

Start variable = S

Σ = {big, stout, Kiao, bought, white, car, Henry, cheese, ate, green}

Rules:

  • S -> < Noun phrase > < Verb phrase >
  • < Noun phrase > -> < Noun >|< Adjective phrase >< Noun > | λ
  • < Verb phrase > -> < Verb >< Noun phrase >
  • < Adjective phrase > -> < Adjective phrase >< Adjective > | λ
  • < Noun > -> Kiao | car | Henry | cheese
  • < Verb > -> bought | ate
  • < Adjective > -> big | stout | white | green

Example of sentences generated using it are:

  • Kiao bought car
  • Henry ate cheese
  • Kiao bought big car

Unfortunately, it also generates some invalid sentences:

  • big cheese ate Henry
  • First Order Logic: Terms and formulas of Formal Logic are Context Free Grammar with the exception of the set of variables which can be infinite along with multiple start variables.

--

--