Week 7— Multitask learning experiments

Marco Sobrevilla Cabezudo
3 min readJul 21, 2020

--

Previously on my GSoC story:

I started coding the Transformer-based NMT architecture and adapt this to a Multitask learning (MTL) approach considering one encoder and three decoders.

What did I do?

I finished the code and run some experiments. I compared the use of MTL combining all tasks in the decoder and each task trained separately. The results are shown in Figure 1. I calculated the accuracy for the discourse ordering and text structuring task and BLEU for the lexicalisation task.

The results show that the ordering and structuring tasks do not achieve good performance and this drop (in terms of performance) increases when MTL is used. However, the Lexicalisation task presented good results and the MTL approach seems to be beneficial to this task. I also verified the results of this work (that compares pipeline and end-to-end approaches)[1]. Figure 2 shows the results of the pipeline approach for each task (discourse ordering, text structuring, lexicalisation and referring expression generation) and we can see that ordering and structuring tasks were worse than the pipeline approach, however, the results in the lexicalisation task are better and the difference in both cases are big.
In a manual inspection, I saw that the discourse ordering and the text structuring suffer from several repetitions in the decoding. I think we can constrain the model to generate tokens that are part of the input.

Figure 1. Results on DEV and TEST
Figure 2. Results for all tasks [1]

In my last meeting, we talked about other experiments. MTL approaches usually need big corpora to train better and the get a better generalization power. Thus, It is possible no to get good results in our context. To approach this problem in another way, we decided to follow a Transfer Learning (TL) approach in which we train a model in a high-level task, and then we use only the pre-trained encoder to train the next task and so on. Figure 3 shows the two approaches. Differently from the MTL approach, in which we use a different vocabulary for each decoder, in the Transfer Learning approach, we built a common vocabulary for all decoders. Also, we talked about adding the end-to-end task to evaluate if these configurations (both MTL and TL) help this task and explore the use of Byte-Pair encoding for the lexicalisation and the end-to-end task. All this experiments are running now and you can see the code here.

Figure 3. (A) Multitask learning approach and (B) Transfer Learning approach

What’s next?

I am waiting for the results to show and discuss with my mentors. Then, we are going to decide what will be the next steps.

[1] Thiago Castro Ferreira, Chris van der Lee, Emielvan Miltenburg, and Emiel Krahmer. 2019. Neural data-to-text generation: A comparison be-tween pipeline and end-to-end architectures. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pages 552–562. Hong Kong, China

--

--