Syntactic / Constituency Parsing using the CYK algorithm in NLP

Mehul Gupta
Data Science in your pocket
10 min readMay 4, 2020

--

Moving on from POS Tagging in my last post, I will be exploring Constituency parsing this time.

What the heck is parsing?

Parsing means resolving a sentence into its component parts. These components can comprise a word or group of words.

Why do we need these components?

Like POS Tagging is already giving us a lot of information as we have seen in the last post. How will dividing a sentence in bigger blocks help us?

You will need some English tuition first!!

Have you heard of nouns & Verb phrases? Check a few examples below:

  1. A handsome guy (Noun phrase)
  2. The blue umbrella(Noun phrase)
  3. …..is writing (Verb phrase)
  4. …..can’t eat (Verb phrase)

Note → Noun phrase: a group of words acting as a noun in a sentence. Similarly, a Verb phrase is a group of words acting as a Verb in a sentence.

When we were going through POS Tagging, we come to understand one thing!! Using it, we can generate POS tags corresponding to each word. But does this contribute anything when we look at the bigger picture? It isn’t contributing anything that an ML model can derive the meaning of a sentence. The meaning of any sentence can only be derived when we can know how these words combine together in a sentence.

For example, using POS Tagging we might get the below result for

‘the blue umbrella’:

‘The’: Determiner, ‘blue’: Adjective, ‘Umbrella’: Noun

But it is nowhere mentioned ‘blue’ is used for ‘umbrella’. Hence we need some sort of grouping so as to retrieve the relationship between the words of a sentence.

Hence parsing is important.

Now the question is,

How can this be done?

Parsing can be done using 3 methods:

  1. Syntactic Parsing: Using rules to break the sentence into sub-phrases. For a sentence ‘John sees Bill’, this would look something like this:

The tree-like structure shown above is known as a parse tree.

A parse tree converts the sentence into a tree whose leaves will hold POS tags (which correspond to words in the sentence), but the rest of the tree would tell you how exactly these words are joining together to make the overall sentence.

For example, an adjective and a noun might combine to be a ‘Noun Phrase’ (the blue umbrella), which might combine with another adjective to form another Noun Phrase (e.g. the torn blue umbrella)

2. Dependency Parsing: It aims at breaking a sentence depending on the relationship between the words rather than any predefined ruleset. The same sentence ‘John sees Bill’ has the below dependency parsing:

3. Semantic parsing: The toughest of the lot!! it aims at transforming a sentence into a logical, formal representation. You can take it as if I get a sentence

‘How many runs did Dhoni scored in the match?’ can be transformed as a SQL query(or any other formal representation) like SELECT runs from MATCH where player=’DHONI’; (this is just an example of formal representation, there can be other forms as well)

We shall be going with Syntactic Parsing for now!!

Before moving on, we must understand what is Context-Free Grammar:

It consists of a set of rules(called productions), each of which expresses the ways that symbols of the language can be grouped and ordered together, and a lexicon(dictionary) of words and symbols.

Confused as hell?

Examples always help!! Though do explore the grammatical terms mentioned below.

In a CFG(context-free grammar), suppose, we are given the below rules/productions:

  • NP(noun phrase) →Det (Determiner) Nominal
  • NP →Proper Noun
  • Nominal → Noun | Nominal Noun
  • Det →’ a’
  • Noun →’ flight’

……And many more

Here, the terms to the left of → can constitute the terms on the right i.e if we encounter a Noun_Phrase (NP), it can constitute ‘’Det’ followed by a ‘Nominal’(rule 1) or a ‘Proper Noun’(rule 2).

Explore here for a better understanding.

Similarly, Det — ’ a’ states Det can constitute ‘a’ (word/symbol. They can be taken as the smallest units which can’t be replaced by any term i.e they can never be on the left side of →). We can have two types of terms in CFG.

  1. Terminal: Terms that can’t be replaced (‘a’,’ flight’ in the above example. They always take the leaf position in a parse tree. Once they are encountered, they can’t constitute anything else. Can be taken as constants)
  2. Non-Terminal: Terms that can be replaced by other terms(NP, Nominal in the above example. They can never be at the leaf of a parse tree. They are replaced by either Non-Terminals or Terminals)

Below are the components of a CFG:

Here, β simply represents all the terms (union of terminal & non-terminals )in CFG. S is the starting symbol of the parse tree.

Before moving on to how to create a parsing tree using CFG, we must know how these rules are generated. Though no significant algorithm is followed up for generating CFGs, they are mostly extracted using TreeBank with some modifications if required.

A TreeBank is a corpus that has all its sentences syntactically annotated i.e each sentence has a corresponding parse tree. It must be noted that these corpora are human-annotated (All parse trees are created either completely or partially by manual labeling). The PennTreeBank can be considered as an example.

Creating Syntactic Parse Tree:

  1. Transform CFG to Chomsky Normal Form.

A CFG is in a CNF when all productions/rules follow the following criteria:

AB|C (A non-terminal generating two non-terminals)

A → ‘a’ (A non-terminal generating a terminal)

S → ε (Start symbol generating ε. ε refers to null production i.e empty)

If one of the production rules states A →A |B| C where A, B, C. are non-terminals

This can be brought in CNF by A →A|X, X →B|C where we created a new rule

X →B|C. Note: ‘|’ represents ‘or’

For a better understanding, see an example here: CNF Example

Note: Before moving on, do have a clear understanding of CFG & CNF

Now, once your Grammar is in CNF form, we will be using

Cocke-Younger- Kasami Algorithm/CYK algorithm:

Let me pick up a sentence for generating the parse tree:

Book the flight through Houston

Let’s have our grammar rules as well alongside its CNF

Do remember that this table doesn’t include some very obvious rules like Verb →Book, Proper Noun →Houston, Det →The, Preprostion →Through, Noun →Flight, etc. So if you feel any term missing in the below pictures, assume it has been taken into account.

Note: All the rules mentioned below are considered from the CNF table(right side) & not the original Grammar table

We need to set up a matrix as shown below with size N X N (N=no. of words) with each column representing the words of the sentence in the same sequence.

___________Book______The_______Flight____Through___Houston

The black arrowheads show us the direction from where we will start filling this matrix.

The final result will look like this:

  1. Starting from [4,5] i.e ‘Houston’, looking at rules, it can be obtained using NP (10th rule, CNF) & Proper Noun(Proper Noun →Houston).

2. Main things start when we move a step upward i.e to [3,4] representing ‘through’. It can be obtained using Prep(Prep/Preposition →Through).

3. We must remember that as we move to the right-hand side of a row, we need to append the coming words to the words already filled in the left-hand side in the same sequence and find rules producing that group of words as a whole.

4. Hence at [3,5], we need to figure out the rules generating ‘through Houston’ & not only ‘Houston’. Now for this, we need some help from [3,4] representing ‘through’ & [4,5] representing ‘Houston’. What we will aim for is to find Tags @ [3,4](Prep) X Tags @ [4,5] (NP, Proper Noun). Hence leading to the formation of pairs ‘Prep Np’ & ‘Prep Proper-Noun’. Now, if any of these can be obtained from the above productions, fill those rules in [3,5]. If you look, Prep NP can be generated using the 21st rule i.e PP →Prep NP. Hence [3,5]=PP

5. Moving to [2,3], ‘flight’ can be obtained using the 12th rule (Nominal →flight) & Noun →flight. Moving to [2,4], we will try to produce ‘flight through’ For which we will locate tag pairs produced by crossing [2,3](Noun, Nominal) with [3,4](Prep). Hence find a rule producing ‘Nominal Prep’ or ‘Noun Prep’. There is no rule!!! leave it blank.

6. Moving to [2,5]. We need to figure out a ‘flight through Houston’. Here, we will try figuring out tag groups formed by [2,3](flight) x [3,5](through Houston) or [2,4](flight through) x [4,5](Houston). We must understand that we are trying to make the current sentence segment using previously drawn sentence segments. Hence, ‘flight through Houston’ can be broken as ‘flight through’ + ‘Houston’ or ‘flight’+’through Houston’. As ‘flight through’ can’t be derived & hence blank, we can easily drop this segmentation. Hence, [2,3]x[3,5] gives us ‘Nominal PP’ & ‘Noun PP’. Rule 14 states ‘Nominal →Nominal PP’. & hence, [2,5]=Nominal

7. Considering [1,5]= ‘the flight through Houston’ can be broken as

‘the’+’flight through Houston’([1,2] x [2,5])

‘the flight’ + ‘through Houston’ ([1,3]x[3,5])

‘the flight through’ +’Houston’ ([1,4]x[4,5])

Now, create tag pairs accordingly & figure out rules for producing these pairs. Fill the cell with all the rules producing any of these pairs.

Similarly, try filling up the entire matrix till [0,5].

Note that cells at the beginning of a row are filled directly using the productions mentioned in the CNF table (cells representing single words, with no words on the left)

DONE!!

No way!!

Where is the parsed tree promised in the very beginning!!

We need to bring this important change.

  • When we are filling the above matrix, create a pointer for every cell (representing non-terminals ) storing positions from which it was derived. Non-terminals refer to POS Tags while terminals refer to sentence words (‘book’, ‘the’, ’flight’, ’through’, ’Houston’). For position [2,5], store ‘Nominal PP’
  • If we find multiple rules fulfilling the tag pairs, we need to store the above pointers for every non-terminal of the cell
  • Once reaching [0,5], backtrace all the non-terminals at [0,0] using the pointer values stored using recursion till all the leaves (non-terminals) are reached
  • If [0,5] is blank, this means the sentence is grammatically/syntactically incorrect. No parse tree exists
  • If we get multiple parse trees, the sentence is ambiguous (from which many meanings can be drawn, as shown in the below example)

Consider the sentence:

‘all aged men & women

Here, we can have two meanings.

‘all aged men & (aged) women’

or

‘all aged men & (no age barriers, all age groups) women’

Hence, this sentence can have multiple parse trees corresponding to each meaning if we use the CYK algorithm(as mentioned above).

Which one to choose?

Here comes Probabilistic Context-Free Grammar !!

There exists a very small difference between CFG & PCFG. In PCFG, we are available with a probability corresponding to each production. Like if we have A →BC, in PCFG it would be A →BC(β) where β is the probability for the rule. When multiple parse trees are observed, using the Viterbi algorithm, the most prominent tree is chosen.

Won’t be going into any sort of detail!!

Just the last topic left. Bear a little more!!

Partial Parsing:

Partial parsing refers to more granular segmentation as compared to syntactic parsing where the segmentation is quite fine

Consider Named Entity Recognition(extracting named entities like person names, city names, countries, locations, etc from the sentence).

Why would I do such a tedious task(Creating a parse tree is computationally a heavy task) of creating a parsing tree for the entire sentence when I am interested only in some information from the sentence? & hence we will be going for partial parsing in this case

Comes the role of Chunking!!

Chunking helps us in the partial parsing of a sentence required for Information Extraction in NLP (Named Entity Recognition is an example of Info Extraction).

Basically, Chunking segment a sentence in non-overlapping phrases: Noun phrase NP(most common), Verb phrase VP, Adjective phrases AP & Prepositional phrase PP. By non-overlapping I mean no word should be common in any two segments.

Given the below sentence:

“The morning flight from Denver has arrived” can be segmented as:

[NP The morning flight] [PP from] [NP Denver] [VP has arrived.]

Notice that we are aiming to find out a less detailed parsing tree where we don't determine the relationship between ‘has’ & ‘arrived’ in the last segment. They are segmented as one unit.

No ML models so far?

We can use supervised learning for chunking!!

What we need is training data. Though, we would be adding some more information apart from the tags assigned in the above example. We would be producing outputs using IOB Tagging where we introduce a tag for the beginning (B) and inside (I) of each chunk type, and one for tokens outside (O) of any chunk alongside the grammatical tags.

The below example explains it better.

Here, B_NP represents the Beginning of a chunk that is a Noun Phrase. Similarly, I_NP represents this word is inside a chunk(marked by the most recent B) which is a Noun Phrase.

We use a sequence labeler model(trained on a dataset) as shown below for Chunking.

This intakes the POS Tag corresponding to each word & assigns IOB Tags once trained. Hence, we train a model using POS Tags to produce IOB Tags.

Won’t be going into further detailing!!

More than enough for the day!!

--

--