Natural Language Processing (NLP): Chomsky’s Theories of Syntax

6 min readDec 27, 2017

By Eray Ozkural, Director of Artificial Intelligence, Machine Learning and High-performance computing, eHealth First Project.

Chomsky’s major contribution to Linguistics is widely regarded as his mathematical theory of syntax, which was later extended to account for semantic structures. Chomsky proposed an abstract, mathematical theory of language that introduces a generative model which enumerates the (infinitely many) sentences in a language.

The model is composed of Postlike production rules, which formally transforms sequences of symbols, and its unrestricted form is equivalent to Turing computation, although restrictions of it are equivalent to simpler automata. The formalism contains non-terminal symbols which correspond to syntactic phrases, such as a verb phrase, and terminals which are usually words, letters, or any other building blocks.

The grammar may be used to transform sentences such that only grammatically correct sentences would be generated starting from a special symbol; that is, the phrase structure of a sentence is easily notated with the aid of such a grammar, a process which is known as derivation.

The same phrase structure syntax may be used to recognize a sentence in the language, a procedure which is referred to as parsing. The process of syntax analysis and generation is essential to language processing, as it allows us to proceed to the second stage of interpreting the syntactic form and constructing a semantic representation, which is the task of understanding the text, the dual task of which is producing an expression from a semantic representation. The full gamut of such processing is known as Natural Language Understanding, a classic treatment of which may be found in (Allen 1995).

Fig 1.1 Grammar notation, this is a context-free grammar.

A typical grammar is notated as in Fig 1.1, which shows an incomplete example. The nonterminal symbols are abbreviations. Fig 1.2 depicts the derivation of a full sentence starting from the starting non-terminal symbol Sentence, arrows show the application of a production rule, full names are used for non-terminal symbols, terminal symbols are in red.

Fig 1.2 An example derivation of the sentence “the cat sat on the mat”

The point of the example is to show how the simple formalism of grammar productions can capture the recursive syntax of English. All natural languages follow a similar recursive pattern, and usually a context-free grammar (one where left-hand side has only a single nonterminal symbol) is sufficient to denote the essential syntax of the language, although they do not capture many subtleties of language. These grammars correspond to a class of automata simpler than a Turing Machine, which is known as Push-Down Automata, a kind of stack machine. Yet even simpler languages exist, which are known as regular languages, and they are used for simpler sequencing rules in morphology. These language restrictions form a hierarchy of formal languages which is known as the Chomsky Hierarchy. Figures 1.3, 1.4 depict the Chomsky hierarchy.

Table 1.3 Chomsky hierarchy (simplified), top-down from narrow to general. First column is the class of language, the second is the class of automata that accepts or generates such language, and third is the grammar type in Chomsky hierarchy.

Regular languages can capture only simple sequencing and repetition regularities (such as in abcbcbcd), hence the name, while context free is the basic recursive syntax we saw earlier. Context-sensitivity allows multiple terminals on the left-hand side of productions allowing a syntactic dependency to be modeled, and unrestricted grammars allow arbitrary symbols on the left-hand side.

Fig 1.4 Chomsky hierarchy (simplified), Venn Diagram. The most general class of languages., Type-0, subsumes every other class.

The most significant equivalence between formal languages and automata is the equivalence of the unrestricted language/grammar (type 3) with Turing Machines. This equivalence informs us of the deep computational nature of linguistic competence. Surely, the humans are capable of learning any arbitrary grammar, and that turns out to require the capability to simulate any Turing Machine, which is Turing’s Universal Turing Machine. Therefore, acquiring a language with arbitrary syntax requires a universal computer.

The versatility and ubiquity of the applicability of context-free grammar led Chomsky to side with a thesis in philosophy of language known as autonomy of syntax which holds that syntax may be considered independent from other elements of human language. Chomsky postulated an innate Universal Grammar in human brain and proposed that merely syntax and pragmatics may explain the entirety of human linguistic cognition, and considered semantics as merely another syntactic layer. Chomsky’s school in Linguistics eventually formalized this approach, as they postulated more transformations from the surface structure to a supposed deep structure that covers all aspects of the meaning of a sentence.

Head-Driven Phrase Structure Grammar

One of the advanced grammar representations, abbr. HPGS, is a feature-annotated sort of formal grammar which introduces richer analysis to syntaxoriented Linguistics, and it was productive to some extent; although it has a shortcoming, which is that their proponents decried statistical, empirical approaches to constructing such analyses. HPSG is a typical example of the symbolic approach to AI, and it looks more like symbolic programming than a theory of meaning. It seems hardly possible to capture the amazing irregularity of even English with such an approach.

Moreover, the approach did not deal very well with ambiguities and missing/incorrect linguistic data. One would presumably have to annotate issues arising from context manually, it is not quite clear how such annotation would be formalized. Still, one may consider that a good rate of success could be possible in grammatically correct, formal English statements without much complication, such as those that might be found in technical texts. The grammar spanned features in phonology, syntax and semantics layers, and enabled formalizing semantic agreements and attributes required to construct a basic symbolic representation of meaning.

It allowed for recursive, typed attribute-value representations with a matrix notation. Fig 1.5a shows a parse tree example from an introductory lecture to HPSG.

The figure shows that the approach to generate a parsetree recursively from heads works; the example depicts how references between phrases are resolved, i.e., the verb drinking attaches to the subject Toby, and the component scotch. Fig 1.5b shows a referential semantics example which is even more impressive, these analyses may be obtained by various unification strategies.

Fig 1.5b Contents parse of “Every teacher thinks some student read a book”.

These examples encourage us to imagine how larger-scale knowledge representation may be managed in a symbolic AI system. Certainly, the interpretations would have to be automated, either via a statistical method / machine learning, or an algorithm that can capture all types of knowledge we require. A reliable mechanization of the parsing and generation of an advanced semantics-capable grammar would be highly valuable, as we would have progressed from Google-like simplistic syntactic processing of text to proper semantic processing, involving logical inferences, on the web. Please see the book on the subject for further information (Pollard and Sag 1994).

See more about NLP, ML and AI technologies applying for the eHealth First Project in the White Paper (will be available on the website www.ehfirst.io from the middle of January 2018).

The new publications on Medium coming soon.

Natural Language Processing (NLP): Chomsky’s Theories of Syntax

Head-Driven Phrase Structure Grammar

Written by eHealth First