Addressing Limitations on RNNs by Using Transformer-XL | Towards AI
Address Limitation of RNN in NLP Problems by Using Transformer-XL
Limitations of recurrent neural networks
Recurrent Neural Network (RNN) offers a way to learn a sequence of inputs. The drawback is that it is difficult to optimize due to vanishing gradient problem. Transformer (Al-Rfou et al., 2018) is introduced to overcome the limitation of RNN. By design, a fixed-length segment is defined to reduce resource consumption.
However, there is another problem that calls context fragmentation. If the input sequence is larger than pre-defined segment length, the input sequence needs to be separated and information cannot be captured across segments. Transformer-XL is introduced to overcome this limitation by Dai et al. (2019)
To reduce computing resources, the input sequence is split by fixed-length. Dai et al. named it as Vanilla Transformer.
The first limitation is that information cannot be shared across a segment. Although
Transformer is less affected by vanishing gradient problem, it limited its capability if the length of the input sequence is fixed. The second limitation is caused by padding. As fixed-length input is required, padding is needed if the length of the input is shorter than pre-defined. It does not respect to sentence and semantic boundary.
Transformer-XL (extra long) is born to tackle Vanilla Transformer’s limitations.
Instead of disconnected between segments, the hidden state sequence of the previous segment will be used when computing the next segment. Theoretically, we can add multi previous segments such that current segments can reach more information across segments.
Another feature input is positional encodings. Instead of absolute position, relative positional encodings is leveraged to prevent misleading. Therefore, any word has a relative distance of every single word, it does help improve model training.
- Overcome several previous NLP model limitations such as maximum length of information.
- You can use Hugging Face’s PyTorch-transformers library to train a model if using PyTorch. For Keras' users, you may want to try this library.
Like to learn?
I am a Data Scientist in the Bay Area. Focusing on the state-of-the-art in Data Science, Artificial Intelligence, especially in NLP and platform related. Feel free to connect with me on LinkedIn or Github.
- R. Al-Rfou, D. Choe, N. Constant, M. Guo, and L. Jones. Character-Level Language Modeling with Deeper Self-Attention. 2018
- Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, and R. Salakhutdinov. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. 2019