Data to text generation — Let the modelling begin!

6 min readJun 27, 2020

For Google Summer of Code 2020

In my last post, I described the open source project to which I am contributing for the Google summer of code program (GSoC 2020). As a refresher, this project consists of transforming a knowledge base, represented by a set of RDF triples, into a natural language verbalization. For the visual thinkers reading this article, here’s a depiction of an input triple set of size 3. Each triple has the structure of (subject, property, object).

Triple set of 3, taken from WebNLG dataset

In the example above, subjects are denoted in red, properties in green, and objects in blue. This triple set forms a knowledge base with information on the city of Madrid, and may have a target verbalization along the lines of :

“Adolfo Suárez Madrid–Barajas Airport is found in Madrid, Spain where the leading party is Ahora Madrid.”

Why is this cool? Because a lot of the data on the web is organised using resource description frameworks (RDFs), making it easy for machines to store and access information. Unfortunately, humans don’t particularly fare well with such representations, and rather prefer a snippet of natural language text to describe the semantic content contained within a triple set. A more detailed background for this use case can be found in my previous article. This article goes out for the hard core data scientists and computational linguists, who cannot care less about tertiary background information, and what to get right down to the predictive modelling.

How are we modelling this?

Essentially, we are addressing a data-to-text generation use case, where we transform a given input representation, derived from RDF triple sets, into a target sequence of words. We can denote this sequence generation problem in the following manner; We train a parameterized generative model to map an input representation (linearized or graphical) to an output sequence, (Y 1:T) :

Where, (Y) denotes the set of all possible candidate tokens, i.e. the target vocabulary, and (t) denotes the time-step in the generation process. The next steps to take now is to decide what input representation is most appropriate for our knowledge base modelling task.

Sequence to sequence approach

The most simplistic approach is a sequence to sequence one, where we treat the input triple set as a linearized sequence of triples. Such an approach is akin to the one taken by neural machine translation architectures, often sporting recurrent or (more recently) transformer based neural models, with an encoder module tasked to encode the input into a dense context vector, from which a decoder module learns to unravel the relevant information and map it to a target language of choice. Below we can see an example of a sequence to sequence RNN with an attention mechanism, mapping the input triple (Madrid, Country, Spain) to a target verbalization.

Example of a Sequence to sequence RNN with attention mechanism.

Graph-to–Sequence approach

Another approach, specifically suited to our dataset, is a Graph to sequence once. Here, we pre-process given RDF triple sets in order to convert them into a multi-dimensional graph (using the networkx package on python). This way, we can reify the relationships between triple entities in a structured manner, and our modelling architecture can leverage information encoded in the structure of the graph itself, in order to convert a given knowledge base into a target sequence of words.

Thus the task at hand requires generating the target description Y, given an input graph X = (V, E), where V and E are the vertices and edges of the graph, respectively. The reification scheme, described in this paper, is achieved by adding a new relation node corresponding to each knowledge base (KB) entity of the input RDF graph. For example, from the input triple set shown below, the subject, property and object entities in the RDF triples can be augmented by adding new binary relations, A0 and A1 respectively, resulting in the following transformations:

*Illustrated using RDF instance from the WebNLG dataset*

Now, E becomes the set of entities with explicit relations, while V denotes the set of edges, with labels {A0, A1}. This step has multiple utilities; not only does it allow the encoder module to create a hidden state for each relation in the input RDF graph, but also enables the network to model an arbitrary number of knowledge base relations effectively. The goal here is to be able to generate natural language descriptions, given a set of graphical nodes. In doing so, our encoder must be able to correlate the importance of different nodes to the quality of the final context vector, thereby accurately modelling the importance of features present within a given nodes.

Neural architectures such as Graph convolutional networks (GCN) and Graph attention networks (GAT) excel at encoding graphically represented information, and hence become the natural choice for the encoder module. The decoder module can be a standard RNN architecture (GRU/LSTM), which processes the context vector produced by the encoder in order to map it to a target verbalization.

Plan of attack

Now that we have established two different approaches to our problem, the next step is to test out the performance of each architecture, using a combination of automatic evaluation metrics ( commonly metrics like BLEU, METEOR, and Rogue N are used for such use cases), and visual inspections of the outputs produced.

The final goal of this project is to design an end to end RDF-to-text generation system, that is trained in an adversarial fashion. In such a setup, a Generator network is trained to generate an output sequence, given an input representation (be it linearized or graphical). The Discriminator network then proceeds to discriminate between real target instances, and ones produced by the generator network. Before we design the adversarial training setup however, it is important to decide which type of architecture to choose for the generator module. So far, our top contenders are:

Up next

The results obtained by these architectures will be discussed in the following post. We will implement the model with the best results as the generator network in our adversarial training setup. We will also describe in detail the adversarial training setup, which will be formulated as a reinforcement learning problem. We plan to use the generator as a policy network, which approximates a policy (i.e. a trajectory of tokens to take, starting from a given state of the environment, till the terminal state). In our use case, the state is denoted by the sequence of words generated by our generator, up till a given time-step. Similarly, the action to take at any given time step corresponds to the token that our model must predict, given the current state of the model. In other words, the action space corresponds to the target vocabulary size, in our use case. Finally, the reward for a predicted token, will be provided by the discriminator network. The exact manner in which the discriminator is trained, as well as how the score is calculated for a predicted token at a given time step, will be elaborated upon in the following posts for this project. For now, we hope you enjoyed this update on the progress of my GSoC 2020 project. Stay tuned for more!

Written by : Niloy Purkait

References

Comparison of end to end systems with pipeline architectures: https://arxiv.org/abs/1609.05473
Graph convolutional networks: https://arxiv.org/abs/1810.09995
Graph attention networks: https://arxiv.org/abs/1710.10903
Graph attention transformers: https://papers.nips.cc/paper/9367-graph-transformer-networks.pdf
Adversarial training with policy gradients: https://arxiv.org/abs/1609.05473