New Nvidia Paper Accelerates Large-Scale Language Modelling

Synced
SyncedReview
Published in
2 min readAug 9, 2018

Nvidia’s paper Large Scale Language Modeling: Converging on 40GB of Text in Four Hoursintroduces a model that uses mixed precision arithmetic and a 32k batch size distributed across 128 Nvidia Tesla V100 GPUs to improve scalability and transfer in Recurrent Neural Networks (RNNs) for Natural Language tasks.

The model trained a multiplicative Long Short-Term Memory (mLSTM) for unsupervised reconstruction over three epochs of the 40GB Amazon review dataset in just four hours. Previously, training a single epoch of the dataset would have taken about a month. The model cut training time by enabling a single GPU to process significantly more training data.

The team also trained a 8192 neuron mLSTM capable of beating state-of-the-art performance in Amazon review language modeling with a bits per character (BPC) rate of 1.038 and SST classification accuracy of 93.8 percent.

The paper analyzes distributed data parallelism scales with the larger model, common problems of training with recurrent neural networks (RNNs), and the relationship between dataset size, batch size, and learning rate.

The work can serve as a large-scale unsupervised NLP processing pre-training model for deep learning researchers and commercial applications.

The paper was published Aug 3 and is available on arVix:https://arxiv.org/pdf/1808.01371v1.pdf

Author: Robert Tian | Editor: Michael Sarazen

Follow us on Twitter @Synced_Global for more AI updates!

Subscribe to Synced Global AI Weekly to get insightful tech news, reviews and analysis! Click here !

--

--

Synced
SyncedReview

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global