What are Context Free Languages?
Grammars, Derivation, Expressiveness, Chomsky Hierarchy
Previously, we talked about how languages are studied using the notion of a formal language. Formal language is a mathematical construction that uses sets to describe a language and understand its properties.
We introduced the notion of a string, which is a word or sequence of characters, symbols or letters. Then we formally defined the alphabet, which is a set of symbols. The alphabet often goes hand in hand with the language because we define a formal language as a set of strings over a unique alphabet.
Then we explored some operations on the string.
Then we explored some operations on languages.
Exploring operations naturally leads us to the properties of the languages and the resulting languages from these operations.
Two main topics of the article were regular expressions and finite-state automata. Which are computational paradigms that help us express or generate regular languages.
Then we explored the properties of the finite state automata itself, we said that the FSA is an information generating machine, and this process of information generation can be deterministic or non deterministic.
And the main topic of the article was to explore the operations on FSA, and whether they are closed under these operations.
Upon proving that these operations, such as concatenation, are closed we apply them to morphology. In the sense that we can define a FSA that modularly build affixes to form words.
Beyond Regular
Eventually we would like to study and understand natural languages, such as English. So far we learned about regular languages, which are neat, but we know English can’t be a regular language.
How do we know English is not a Regular Language?
If we consider English as just a set of words, which it is.
And we construct regular expressions for each word, which we technically could.
English is a regular language, since union of regular languages is a regular language.
Higher Linguistic Abstractions
But the problem arises when we realize that English is more than just set of words. For example, take the sentence “I ate an apple”, which is a really simple sentence.
If English was a set of words, then order of a group of words shouldn’t matter.
[
[ 'i', 'ate', 'an', 'apple' ],
[ 'ate', 'i', 'an', 'apple' ],
[ 'an', 'i', 'ate', 'apple' ],
[ 'i', 'an', 'ate', 'apple' ],
[ 'ate', 'an', 'i', 'apple' ],
[ 'an', 'ate', 'i', 'apple' ],
[ 'apple', 'ate', 'i', 'an' ],
[ 'ate', 'apple', 'i', 'an' ],
[ 'i', 'apple', 'ate', 'an' ],
[ 'apple', 'i', 'ate', 'an' ],
[ 'ate', 'i', 'apple', 'an' ],
[ 'i', 'ate', 'apple', 'an' ],
[ 'i', 'an', 'apple', 'ate' ],
[ 'an', 'i', 'apple', 'ate' ],
[ 'apple', 'i', 'an', 'ate' ],
[ 'i', 'apple', 'an', 'ate' ],
[ 'an', 'apple', 'i', 'ate' ],
[ 'apple', 'an', 'i', 'ate' ],
[ 'apple', 'an', 'ate', 'i' ],
[ 'an', 'apple', 'ate', 'i' ],
[ 'ate', 'apple', 'an', 'i' ],
[ 'apple', 'ate', 'an', 'i' ],
[ 'an', 'ate', 'apple', 'i' ],
[ 'ate', 'an', 'apple', 'i' ]
]
So all these permutations of the words should have the same meaning. But it doesn’t. Consider the bolded permutations.
- I ate an apple.
- Apple ate an I.
If we disregard the “I/me” rule, it’s kinda clear that the meaning changes. Furthermore, it's also clear many of these permutations are meaningless.
So order matters. Which makes sentences a sequence, not a set.
But what is a sentence? Sentence is a group of words where repetition is allowed and order matters.
Need for Rules
Notice that when we have a sentence of 4 words, we have 4! ways to form sentences. So out of the 24 different ways to order the words, only 1 really makes sense.
For an average sentence length of 20 words in modern English, we find that is 2432902008176640000 different permutations to check. This is not really computationally tractable for computers or brains. Hence the need for ways to speed up sentence creation.
The solution to create well formed sentences is rules or grammars. Think of these as computational shortcuts.
Grammars
Set of rules that manipulate the symbols.
Terminal
Elements of the target language
Elements of some finite set:
- Words
- Letters
Non-terminal
Auxiliary symbols that facilitate the specification
Syntactic categories:
- Sentence
- Noun phrase
- Verb phrase
Rules
A rule is a non-empty sequence of symbols, a mixture of terminals and non-terminals, with the only requirement that the first element in the sequence be a non-terminal one (alternatively, one can define a rule as an ordered pair whose first element is a non-terminal symbol and whose second element is a sequence of symbols)
Given this we are really creating recipes for creating sentences and phrases.
We write such rules with a special symbol, ‘→,’ separating the distinguished leftmost non-terminal from the rest of the sequence. The leftmost non-terminal is sometimes referred to as the head of the rule, while the rest of the symbols are called the body of the rule.
- Non-terminal → Terminal
- Head of the rule → Body of the rule
For natural languages it is the case that the body is either all terminal or non terminal, meaning they don’t mix ever. But there is nothing in the formalism that requires such a rule for grammars.
Given the terminals {the, cat, in, the, hat} and non terminals {D, N, P, NP, PP}, where:
- D is determiner
- N is noun
- P is preposition
- NP is noun phrase
- PP is preposition phrase
Class I (all terminal) examples of rules:
- D → the
- N → cat
- N → hat
- P → in
Class II (all non-terminal) examples of rules:
- NP → D N
- PP → P NP
- NP → NP PP
Context Free Grammars
DEFINITION 1. Context Free Grammar, CFG is a four tuple G = ⟨ V, Σ, P, S⟩ , where:
- V is a finite set of non-terminal symbols
- Σ is an alphabet of terminal symbols
- P ⊂ V × (V ∪ Σ)* is a set of rules
- S ∈ V is the start symbol
Grammar G = V,Σ, P, S, where V = {D, N, P, NP, PP}, Σ = {the, cat, in, hat }, P is the set of rules, and the start symbol S is NP.
Generally we introduce grammars by listing their rules only, which is denoted by P.
Derivations
DEFINITION 2. Let G = ⟨ V, Σ, P, S ⟩ be a grammar. The set of forms induced by G is (V ∪ Σ)*:
- A form α immediately derives a form β, denoted by α → β, if and only if there exist γₗ, γᵣ ∈ (V ∪ Σ)* such that α = γₗAγᵣ and β=γₗγγᵣ and A → γ is a rule in P
- A is called the selected symbol
So what do forms and derivations have anything to do with context free languages?
We formally define these concepts so that we can define the language generated by a context free grammar.
This basically says that the language generated by a grammar is denoted by its start symbol.
EXAMPLE 1. Set of possible rules over these two sets:
- V = {D, N, P, NP, PP},
- Σ = {the, cat, in, hat },
is:
EXAMPLE 2. Let grammar be G = ⟨ V, Σ, P, S ⟩ where:
- V = {D, N, P, NP, PP},
- Σ = {the, cat, in, hat },
- P is the set of rules, and
- the start symbol S is NP.
Derivation Trees
These are sometimes called parse trees, they help us visualize what the derivations are doing. You can think of them as imposing a structure or form on words.
Some tree terminology:
- Vertex is each node
- Root is the top most node, which is the start symbol S
- If a vertex has an outgoing branch, then it must be a non-terminal symbol
- This implies that this parent node is the head of the rule and the child node or children are the bodies of the rule
- Leaf is a node with no outgoing branches
- Derivation trees should have a natural or left to right order,
- If you read the leaves like this, we call this the yield of the tree
Ambiguity
We can express ambiguity using derivation trees. Consider these sentences:
- (the cat in the hat) in the hat
- the cat in (the hat in the hat)
These have different meanings even though without the brackets they’re exactly the same sentence.
We call this structural or syntactic ambiguity.
Expressiveness
We will show here that we can express all regular expressions using grammars, and show that grammars can generate even more abstractions, implying that the class of regular languages is contained within the class of context free languages.
We say that context free languages are more expressive than regular languages.
Right Linear Grammar
In this type of grammar, the body of the rule consists of a sequence of terminal symbols:
- i.e. Σ = {the, cat, in, hat }
optionally followed by one non-terminal. Restricting that there should never be more than one non-terminal in a body.
If there is then that is a leaf node.
This sub-type of grammar generates all and only regular languages.
Chomsky Hierarchy
We learned so far about context free grammars and right linear grammars and the languages they generate.
If we relax our definition of context free grammars, we can introduce context sensitive grammars, these have even more expressiveness than CFGs.
Context Sensitive Languages
In the head of the body, we not only have the head, we would also have the context. The formalism that corresponds to this class of languages is called the Context Sensitive Grammar,
Note that context free languages are a special case of context sensitive languages, where the context is empty.
Recursively Enumerable Languages
Now above the context sensitive grammars is the general phrase-structure grammars also called unrestricted grammars, which generate the recursively enumerable languages.
Natural Languages
Natural languages are mostly context free and only some specific constructions in natural languages necessitate us to use context sensitive grammars. This motivated us to develop the mildly context sensitive category.
Weak Generative Capacity
Weak generative capacities are a type of property of computational formalisms. For example when we say “Weak generative capacity of regular expressions is insufficient for generating English, we are saying “regular expressions can’t generate all the words or sentences in English”.
Strong Generative Capacity
So far we have considered only the generation of words or sentences by grammars. But the structure is as important if not more. Strong generative capacity helps us weakly generate all the strings in the language AND assign correct structures to them.
Mildly Context Sensitive Languages
These are almost an extension of the Context Free Languages, this class of languages have the properties:
- properly contains all the context free languages
- can be parsed in polynomial time
- can properly account for the construction in natural languages that CFGs can’t
- cross serial dependencies
- has the linear growth property
Tree Adjoining Grammars
TAG constructs larger trees from smaller ones, so that basic operations that take place during derivations are not limited to string concatenation.
These operations facilitate nesting of one tree within another, resulting in extended expressiveness.
Natural Language Grammars
Several other grammars are proposed as adequate for expressing natural languages:
- Head Grammars
- Linear Indexed Grammars
- Combinatory Categorical Grammars
All three are weakly equivalent to TAG, which could imply that TAG may just be the correct formalism class in which all natural languages reside.
Trade Offs
We learned so far that a language is just a set of words, however we can still classify them into contained layers.
- Regular languages
- Context free languages
- Mildly context sensitive languages
- Context sensitive languages
- Recursively enumerable languages
Where each language has a corresponding computational formalism called a grammar. And each outer grammar gives us more expressiveness.
However this expressiveness does come at a cost, a computational cost. Because each grammar comes with a computational model as well.
- Regular languages → Finite state automata
- Context free languages → Push down automata
- Mildly context sensitive languages
- Context sensitive languages → Linear bounded turing machines
- Recursively enumerable languages → Turing machines
And linguistics tasks such as:
- recognizing sentences, phrases, bigrams, hapaxes, words, morphemes
- determining their structure at each level
- extracting their meaning
- manipulating the information they contain
are all computational tasks that come with algorithms, algorithms which have space and time constraints and demands.
For example a string’s membership in a language generated by a tree adjoining grammar can only be checked in time proportional to |w|⁶.
Checking membership of w ∈ L(G) where G is context free requires |w|³ time as well as the computing model of push down automata which is a modified FSA with a unbounded memory in the form of a stack.
The complexity of these models and the algorithms is what we will learn about next.
Other Articles
This post is part of a series of stories that explores the fundamentals of natural language processing:1. Context of Natural Language Processing
Motivations, Disciplines, Approaches, Outlook2. Notes on Formal Language Theory
Objects, Operations, Regular Expressions and Finite State Automata3. Natural Language Tool Kit 3.5
Search Functions, Statistics, Pronoun Resolution4. What Are Regular Languages?
Minimization, Finite State Transducers, Regular Relations5. What Are Context Free Languages?
Grammars, Derivations, Expressiveness, Hierarchies
Up Next…
In the next article, we will explore Computational Complexity of Natural Languages.
For the table of contents and more content click here.
References
Clark, Alexander. The Handbook of Computational Linguistics and Natural Language Processing. Wiley-Blackwell, 2013.
Eisenstein, Jacob. Introduction to Natural Language Processing. The MIT Press, 2019.
Bird, Steven, et al. Natural Language Processing with Python. O’Reilly, 2009.
Jurafsky, Dan, and James H. Martin. Speech and Language Processing. Pearson, 2014.
Barker-Plummer, Dave, et al. Language, Proof and Logic. CSLI Publ., Center for the Study of Language and Information, 2011.