The Latest in AI Research — Issue #5
The AI research newsletter by Daniël Heres
Papers of Interest
The original Transformer model presented in the Paper “Attention Is All You Need” showed that language tasks like translation don’t need convolutions or recurrent connections for good performance but mainly something called “Attention”. Good models with attention even perform better than RNNs or CNNs while also being faster to train and to sample from.
In this paper they change the model to recur over the intermediate representations in the model instead over the symbols in a sequence, i.e. depth-wise. This transformation function can be applied any number of times, whereas in most models the number of steps is fixed by the number of layers in the model.
The model gets state of the art results on a number of challenging tasks: Question Answering, Language Modeling, (most) Algorithmic Tasks and Machine Translation.
Learning multiple tasks in a model is something that is very challenging because different tasks have different scale of outcomes. This can be the case in for example games where in one game the scale of the points you get is much higher than in others.
This is why the authors propose PopArt: an adaptive normalization technique meant for Reinforcement Learning problems.
The authors show that this technique leads to considerable improvements across games, solving much harder games and a wider set with the same model than before. They also show that it doesn’t need the “clipping” hack anymore to function, in contrast to previous methods.
This works show that Generative Adversarial Network benefit from scaling up: more parameters, larger batch sizes and some architectural changes.
Increasing the batch size already improves the batch size inception score by 46%, doubling the channels in convolutional layers further increases it by 21%.
They demonstrate by showing the nearest neighbors the model does not memorize the examples from the training set, but generates novel ones instead.
The examples and results of this paper are amazing and shows that models can benefit from scaling up, combining techniques from literature and carefully running experiments.
Colorful Code and Delicate Datasets
Later this year, TensorFlow 2.0 is planned. Development will focus on ease of use, mainly on the following points: eager execution, platform and language support and deprecate API’s. tf.contrib will be discontinued as part of TensorFlow.
This release brings a lot of new features and improvements. Support for missing features in preprocessing and model fitting is implemented. OPTICS, a more scalable version of the clustering algorithm DBSCAN is added.
PyTorch, a modern deep learning library and ecosystem is nearly released.
It brings a number of interesting features:
- JIT, a way of writing code in a subset of Python that still will benefit from (Just In Time) optimizations.
- Support for high performance data parallelism and efficient distributed training.
- A C++ frontend to PyTorch with a similar interface as in Python. This could be used for certain production use cases.
And much more. It seems when 1.0 released, PyTorch will even get more adoption as the focus will move more towards a stable interface and models for production.