Week 4: Results of the first part
Finally, I got the official results of the first month. As you should know (if you see my previous posts), I explored some node embedding pre-training algorithms to generate node embeddings and then I introduced these embeddings in the embedding layer of the RDF-to-Text architecture and froze the embedding layer (in the source).
The results are presented in Figure 1. I compared all strategies with a baseline that consists in training the source embeddings from scratch. The evaluation was performed by using the script provided by the GenerationEval.
The configuration for each experiment is described as follows:
RDF2Vec: -embed-size 300 -depth 8 -walks 200 -jobs 5 -window 5 -sg -max-iter 30 -negative 25 -min-count 1
PYKE: --embedding_dim 300 --num_iterations 1000 --K 45 --omega 0.45557 --energy_release 0.0414
Besides, we tested adding the relations of the webnlg and the supplementary information about the WebNLG dataset (column webnlg and supplementary in Figure 1) and evaluate the use of literals in the RDF2Vec (PYKE does not allow this). The hyperparameters used in the training can be accessed in the section 4 of the Github repo.
In general, the baseline was strong and no strategy overcame its performance. In the case of RDF2Vec, the use of literals (in a naive way) seems to show small improvements. However, this is not enough to overcome the baseline. Also, the use of the webnlg dataset and the supplementary information help to reduce the disconnection between the modified relation/properties and their respective original ones in the knowledge graph but the gain is small (and sometimes, like in the case of comparing Weisfeiler-Lehman with/without supplementary, the performance drops).
On the other hand, PYKE showed the worst performance and the incorporation of webnlg triples and supplementary information was harmful. A possible explanation is related to the number of “literals” contained in these last files.
In a manual inspection, I could see that literals are still a problem in the pre-trained node embeddings. For example, we could look at these 2 sentences:
Ex 1. adolfo suárez madrid-barajas airport runway name is 14l/32r . (reference)
- the runway name of adolfo suarez madridbarajas airport is 14l32r (baseline)
- the runway name of adolfo suarez madridbarajas airport is 18l36r (best configuration of RDF2Vec)
Ex 2. the runway length of alpena county regional airport is 1,533 . (reference)
- the runway length of alpena county regional airport is 1533 (baseline)
- the runway length of alpena county regional airport is 27440 (best configuration of RDF2Vec)
As it can be seen, text literals like “14l32r”, “18l36r”, or numerical literals like “1533” and “27440” can be easily changed by using RDF2Vec. This shows that it is probably that their representations are not good enough to generate some distinction between them during the inference (text generation). This happens several times in the dev set. This way, some possible improvements to this work isto explore other ways to pre train embeddings that consider not only structural information but also text semantic information.
What’s next?
Finally, these were the results of the first part of my GSoC proposal. I am waiting for this weekly meeting to continue with the next part of the project, i.e., hierarchical decoding (or continue a bit more with pre-training :) ).
The GitHub repo can be found here and it contains a guideline to reproduce the results I had. Finally, the embeddings that I generated can be found in this link and the subgraphs used can be found here.