Graphs Neural Networks in NLP

Capturing the semantic, syntactic, temporal and relational structure between words (through GNNs)

Published in

NeuralSpace

4 min readJul 10, 2020

Graphs have always formed an essential part of NLP applications ranging from syntax-based Machine Translation, knowledge graph-based question answering, abstract meaning representation for common sense reasoning tasks, and so on. But with the advent of ‘end to end deep learning’ systems, there was a decrease in such traditional parse algorithms. In fact, there have been many debates on flattening of state of the art NLP systems due to lack of completely new ideas!!!!!

“NLP is kind of like a rabbit in the headlights of the Deep Learning machine, waiting to be flattened.”, Neil Lawrence (ICML 2015 workshop).

With this post, I want to mention the recent trends in the field of NLP using Graph Neural Networks (GNNs) and why I think they are one of the directions of hope for the field. For a basic understanding of a GNN- https://www.youtube.com/watch?v=7JELX6DiUxQ

But even if you don't know how GNN’s work this post will give you an idea of where GNN’s could be used in NLP which might give you a motivation to learn about GNN’s.

Syntactic and Semantic Parse Graphs

The image describes the parser output by the Spacy tagger. We can define every node as a word and every edge as the dependency parse tag. Every word can have pos tags as attributes.

Some might argue that powerful attention mechanisms can automatically learn the syntactic and semantic relationships. Though no theoretical work is out there in my knowledge which highlights where is attention ineffective. Here is an example- let’s consider the task of aspect-based sentiment analysis (finding sentiment for each feature- If we are trying to get sentiment for our brand we can get sentiment for different features like- fitting, material, shipping, etc.- These all are different aspects) Now if the sentence given to the network is-

“Its size is ideal and the weight is acceptable.” Attention-based models often identify acceptable as a descriptor of the aspect size, which is in fact not the case. In order to address the issue (He et al.) imposed some syntactical constraints on attention weights.

This gives us a slight hint of why we could use dependency parse as additional information in our NLP applications.

Knowledge Graph

A knowledge graph represents a collection of interlinked descriptions of entities — real-world objects, events, situations, or abstract concepts. Every node is an entity and edges describe relations between them. Most famous KGs in NLP include Dbpedia, WikiData, ConceptNet.

Fact-based question answering is not new for NLP research but it was previously limited to the facts present in the database. But with techniques such as GraphSage(Hamilton et al.), the methods can be generalized to previously unseen nodes. In fact, recent ACL paper (Saxena et al.) generalizes to multi-hop QA for unseen nodes. (Pretty exciting?)

Temporal graphs

LSTMs have been shown to be poor on long-range dependencies so connecting words/documents through edges that depict instances in time would be one of the solutions. [Mirza]

With graph neural networks we can leverage all these structures present in natural language to form richer embeddings.

But why graphs? Isn't graph formation from text supervised and our ultimate aim is to make learning unsupervised?

Language acquisition has not yet been proved to be learned from scratch from the human brain. Some scientists argue the presence of inherent capabilities since birth (in the form of semantics- let's keep that for another post).
There has been ongoing work on knowledge acquisition through plain text in an unsupervised fashion (Looks pretty promising?).
The world has hierarchies- We learn words and then build complete sentences.
Multimodal graphs- KGs can be extended to multimodal graphs- which have images, words, and documents to process information
Concepts- As ConceptNet, concepts can be learned in a graphical form in an unsupervised / semi-supervised manner.

These are just my abstract thought on the topic. Feel free to agree or disagree in the comments. For more explanations/comments email on- purvanshi.mehta11@gmail.com

References -

[He et al.] Effective attention modeling for aspect-level sentiment classification.
[Saxena et al.] https://www.aclweb.org/anthology/2020.acl-main.412/]
[Thesis Mirza] https://arxiv.org/pdf/1604.08120.pdf
[Hamilton et al.] Graphsage
One of best source for GNNs in NLP (EMNLP tutorial 2019)- https://shikhar-vashishth.github.io/assets/pdf/emnlp19_tutorial.pdf