Constituency Grammar: NLP

Saksham Vikram
8 min readApr 13, 2020

--

The fundamental notion underlying the idea of the constituency is that of abstraction — groups of words behaving as single units, or constituents. The most widely used formal system for modelling constituent structure in English
and other natural languages are the Context-Free Grammar or CFG. Context-free grammars are also called Phrase-Structure Grammars

A context-free grammar consists of a set of rules or productions, each of which expresses the ways that symbols of the language can be grouped and ordered together, and a lexicon of words and symbols.

For example, the following productions NP express that an NP (or noun phrase) can be composed of either a proper noun or a determiner (Det) followed by a Nominal; a Nominal, in turn, can consist of one or more Nouns.

NP → Det Nominal
NP → ProperNoun
Nominal → Noun |Nominal Noun

The symbols that are used in a CFG are divided into two classes:-

  1. The symbols terminal that corresponds to words in the language (“the”, “nightclub”) are called terminal symbols. Lexicon are a set of rules that introduce these terminal symbols.
  2. The non-terminal symbols that express abstractions over these terminals are called non-terminals.

In each context-free rule, the item to the right of the arrow ( →) is an ordered list of one or more terminals and non-terminals; to the left of the arrow is a single non-terminal symbol expressing some cluster or generalization.

In Lexicon the non-terminal associated with each of its words is its part of speech.

CFG also determines its start symbol like any other language and it is denoted by S which is regarded as a sentence node.

A parse tree for “a flight”.

The following rule expresses the fact that a sentence can consist of a noun phrase followed by a verb phrase

S → NP VP I prefer a morning flight

Let’s talk a bit about verb phrase before we dive deep into it. A verb phrase in English consists of a verb followed by assorted other things for example,

one kind of verb phrase consists of a verb followed by a noun phrase:
VP → Verb NP prefer a morning flight

Or the verb may be followed by a noun phrase and a prepositional phrase:
VP →Verb NP PP leave Boston in the morning

Or the verb phrase may have a verb followed by a prepositional phrase alone:
VP →Verb PP leaving on Thursday

A prepositional phrase generally has a preposition followed by a noun phrase. PP →Preposition NP from Los Angeles

Now we will see the end to end process of defining grammar and then try and generate a sentence with that grammar. Since English Grammar is very vast, we take a subset for illustration purposes. We will call this CFG as L₀.

Lexicons for L₀

Lexicon for L₀

Grammar Rules for L₀

The grammar for L0, with example phrases for each rule.

Sentence constructed with above-defined grammar L₀

The parse tree for “I prefer a morning flight” according to grammar L0

It is sometimes convenient to represent a parse tree in a more compact format
called bracketed notation.

Now that we have a good understanding of the end to end process , we can begin exploring each of these topics in depth.

Noun Phrase

Our L₀ grammar introduced three of the most frequent types of noun phrases that occur in English: pronouns, proper nouns and the NP → Det Nominal construction. The main focus is on the last kind since this is where syntactic complexity resides. These noun phrases consist of a head, the central noun in the noun phrase, along with various modifiers that can occur before or after the head noun.

The Determiner

Noun Phrase can begin with simple lexical determiners for example:-

a stop, the flights , this flight

But determiners role can be also be fulfilled by a possessive expression consisting of a noun phrase followed by an ’s as a possessive marker. Example:-

United’s flight
United’s pilot’s union
Denver’s mayor’s mother’s cancelled flight

Det → NP ’s

Since the rule is recursive in nature we can easily model the last tow examples which have a sequence of possessive expressions.

In some circumstances, determiners are optional when they are modifying a noun which is plural. Example:-

Show me flights from San Francisco to Denver on weekdays.

The Nominal

The nominal follows after determiners and contains any pre and post head noun modifiers. In its simplest form, it consists of a single noun.

Nominal → Noun

Before the Head Noun

A variable of classes appears before head noun called “postdeterminers” in a nominal. These include cardinal numbers, ordinal numbers, quantifiers, and adjectives.

Examples of cardinal numbers:
two friends, one stop

Ordinal numbers include first, second, third, and so on, but also words like next, last, past, other, and another:-
the first one, the next day
the second leg, the last flight

Some quantifiers (many, (a) few, several) occur only with plural count nouns:
many fares

Adjectives occur after quantifiers but before nouns.
a first-class fare a non-stop flight
the longest layover the earliest lunch flight

Adjective phrase Adjectives can also be grouped into a phrase called an adjective phrase or AP.

After the Head Noun

A head noun can be followed by postmodifiers. Different kinds of postmodifiers can be:-

prepositional phrases — — — — — all flights from Cleveland
non-finite clauses — — — — — —any flights arriving after eleven a.m.
relative clauses — — — — — — — — a flight that serves breakfast

prepositional phrase postmodifiers, with brackets inserted to show the boundaries of each PP; note that two or more PPs can be strung
together within a single NP:

all flights [from Cleveland] [to Newark]
arrival [in San Jose] [before seven p.m.]
a reservation [on flight six oh six] [from Tampa] [to Montreal]

Rule to account for this postnominal PP’s
Nominal → Nominal PP

Three most common kinds of non-finite postmodifiers are the gerundive(-ing),-ed and infinitive forms.

Gerundive postmodifiers are so-called because they consist of a verb phrase that begins with the gerundive (-ing) form of the verb. Here are some examples:

any of those [leaving on Thursday]
any flights [arriving after eleven a.m.]
flights [arriving within thirty minutes of each other]

We can define a rule for these Nominals with gerundive modifiers as follows, making use of a new non-terminal GerundVP:
Nominal → Nominal GerundVP

Here GerundV is nothing but a form of verb so we can replace V with GerundV for all the grammar rules we saw for VP and those rules can become for GerundVP:-

GerundVP →GerundV NP | GerundV PP |GerundV |GerundV NP PP

GerundV → being | arriving | leaving

Examples for Infinitive forms:-
the last flight to arrive in Boston

Examples for -ed forms
I need to have dinner served

A postnominal relative clause (more correctly a restrictive relative clause), is
pronoun relative a clause that often begins with a relative pronoun (that and who are the most common).The realtive pronoun functions as a subject of the embedded verb.

a flight that serves breakfast
flights that leave in the morning

Some Rules to summarise above:-
Nominal →Nominal RelClause
RelClause →(who |that) VP

Before the Noun Phrase

Word classes that modify and appear before NPs are called predeterminers. Many of these have to do with number or amount; a common predeterminer is all:
all the flights, all flights, all non-stop flights

Now that we have seen explored Noun Phrases Let’s parse a sentence with the help of these rules.

A parse tree for “all the morning flights from Denver to Tampa leaving before 10”.

The Verb Phrase

The verb phrase consists of the verb and a number of other constituents. In the simple rules we have built so far, these other constituents include NPs and PPs and combinations of the two:

VP →Verb| disappear
VP →Verb NP |prefer a morning flight
VP →Verb NP PP| leave Boston in the morning
VP →Verb PP | leaving on Thursday

But verb phrases can be way more complicated than this.Many other kinds
of constituents, such as an entire embedded sentence, can follow the verb. These are complements sentential called sentential complements:

You [VP [V said [S you had a two hundred sixty-six dollar fare]]

Here’s a rule for this
VP → Verb S

Similarly, another potential constituent of the VP is another VP. This is often the case for verbs like want, would like, try, intend, need:
I want [VP to fly from Milwaukee to Orlando]

While a verb phrase can have many possible kinds of constituents, not every
the verb is compatible with every verb phrase. Modern grammars distinguish as many as 100 subcategories. We say that a verb like find subcategorizes for an NP, and a verb like want subcategorizes for either an NP or a non-finite VP. We also call these constituents the complements of the verb (hence the use of the term sentential complement above).

So we say that want can take a VP complement. These possible sets of complements are called the subcategorization frame for the verb.

Subcategorization frames for a set of example verbs.

Coordination

The major phrase types discussed here can be conjoined with conjunctions like and, or, and but to form larger constructions of the same type.

For example, a coordinate noun phrase can consist of two other noun phrases separated by conjunction:

Please repeat [NP [NP the flights] and [NP the costs]]
I need to know [NP [NP the aircraft] and [NP the flight number]]

Hers’s a rule summarising above.
NP →NP and NP

The same rule can be applied to VP, S and all other major phrase types ,it is also possible to represent this conjunction fact more generally:

X →X and X

This summarises the constituency grammar . However, this does not cover the whole depth of Rules and Grammar . But I have tried to cover the majority ones.

Please feel free to comment if u find any loose ends, I will be happy to clarify😋

--

--