How to build a gated convolutional neural network (GCNN) from scratch, with Pytorch implementation

Recurrent neural networks (RNNs) with LSTM or GRU units are the go-to tools for NLP researchers, and provide state-of-the-art results on many different NLP tasks, including language modeling (LM), neural machine translation (NMT), sentiment analysis, and so on. However, a major drawback of RNNs is that they are notoriously slow to train, and so much of the effort over the past few years has been focused on trying to speed them up. There have been a variety of approaches to do this, including using pretrained models, using a faster softmax, and using different architectures such as convolutional neural networks (CNNs)…


In Part 1 of this blog post, I explained how the adaptive softmax works, and how it can speed up your language model by up to 1000%. Here in Part 2, I’ll walk you step by step through a Pytorch implementation (with an accompanying Jupyter notebook), which uses Pytorch’s built-in AdaptiveLogSoftmaxWithLoss function.

For preprocessing you will need fastai (see https://docs.fast.ai/), a deep learning library that runs on top of Pytorch that simplifies training neural networks. [For those who want to learn state-of-the-art deep learning techniques, I highly recommend Jeremy Howard’s fast.ai course, which is available online for free: https://course.fast.ai/]. I…


How would you like to speed up your language modeling (LM) tasks by 1000%, with nearly no drop in accuracy? A recent paper from Facebook’s AI Research group (FAIR), by Grave et al. (2017), called “Efficient softmax approximation for GPUs”, shows how you can gain a massive speedup in one of the most time-consuming aspects of language-modeling, the computation-heavy softmax step, through their “adaptive softmax”. The giant speedup from using the adaptive softmax comes with only minimal costs in accuracy, so anyone who is doing language modeling should definitely consider using it. …

David Bressler

PhD in Visual Neuroscience from UC Berkeley, now working on applying deep learning to natural language processing (NLP). Twitter: @davidwbressler

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store