Fluence: A Pytorch Library for Responsible Language Understanding
Fluence is a PyTorch library focused on addressing problems in language understanding with computationally efficient and robust algorithms. It secured the second spot at the PyTorch Global Summer Hackathon 2020 in the Responsible AI category. The library is open source and you can try it with pip install fluence
Overview
Transformers are becoming bigger and better. From improving the state of language understanding to visual recognition, they are becoming universal go-to models for natural language processing and computer vision research. Such remarkable results are often the result of extensive pre-training.
However, pre-training has its downsides some of which are mentioned below. It is very expensive to train language models, require huge amounts of data contributing to lack of access to such models by individual practitioners increased carbon footprints.
Many alternative methods have been proposed to deal with such problems. Some of them include Training these larger models followed by pruning, knowledge distillation, adaptive methods, inducing sparsity. This has become a broad line of research with other methods focussed on reducing runtime complexity, effective weight sharing etc.
On the other hand, language understanding is far from being solved. Although we have been able to achieve remarkable performance on some downstream tasks, it is still not clear what are we actually learning. For example, In NLI it has been shown that these models can learn inherent biases/annotation artifacts very well, lacking generalization capabilities. Interpretability of these models is hard. Moreover, most of our models are biased in some way or the other. This raises the issue of robustness (fairness) when dealing with these models.
Keeping these concerns in mind, I developed a PyTorch library Fluence which tries to address some of these issues. The library is fully compatible with Huggingface Transformers. All the current methods are provided to work seamlessly with transformer models. However, they can also be extended towards generic nn.Module
models. Here, we go through some of the functionalities currently present. Note that this library is in the alpha stage of development and API may change in the future as more features are incorporated.
Addressing Increasing demand of computation
- Adaptive Methods
With numerous components of our workflow becoming differentiable (which means learnable), why shouldn’t the algorithm stay the same. Adaptive methods are flexible classes of methods that change as per the complexity of data. For example, attention mechanism can benefit from reduced attention span if the input sequence does not require to longer span, or dealing with a sparser representation of softmax. This can result in reduction of FLOPS, better interpretability and less computational cost.
Here is an example of how you can use adaptive methods out of the box. To use these methods, you need to add these one liners in init
(to register as model parameters) and forward
method (for modifying forward pass) of your model class. The parameters residing within these methods such as tunable alpha
parameters of entmax
, will be automatically learned during optimization (when optim.step
is called)
Importance Sampling
Data order can play a critical role in how the network learns and training on diverse enough (“informative”) samples can reduce the need for more data. Diverse data can enable a network to learn richer set of features.
Here is an example of how you can use standard KMeans
on embeddings
to extract diverse samples.
Pruning
Many papers have shown that these over-parametrized models can often be pruned (or distilled). We do not require all the weights or attention heads to be able to get the same level of performance. fluence
provides pruning utility methods that allows users to prune any nn.Module
model easily.
User can specify which layers to prune with percentage along with method such (‘l1’ or ‘random’). This makes use of PyTorch pruning method under the hood.
After these networks have been pruned, it has been shown that we can retain much of the performance with very less parameters (compared to original count) making it easy to use these models for inference and deployment.
Robustifying Models
With inception of large pre-trained models, this area of research has become prominent focussing on how our models can perform on challenge/adversarial tasks. Research has dived into many directions such as explicit debiasing, demonstrating the need to make them robust etc.
A sample example of how orthogonal projection can help in keeping superficial features from the data out. Plug in any naive model with a huggingface transformer model.
Future Directions
Here are some of the features if incorporated would benefit research community greatly:
- NLP is growing rapidly. It is hard to keep yourself updated with all the methods that are proposed with each wave of conference. There is a need for a repository which contains implementations (stays updated) of numerous existing approaches (well tried) in a standardized way.
- Research code is often task specific and not maintained which makes it hard for researchers to use the provided codebase. There is a clear requirement for standardization. While there have been changes proposed for reproducibility, it is still not become widespread. A standardized codebase can have huge impacts in promoting reproducibility.
Keeping these points in minds, this library will aim to tackle these challenges in whatever way possible. Check the wiki section on how you can integrate fluence in your workflow. Fluence is supposed to be extended and used with other libraries.
Finally, I would like to thank the PyTorch team for organizing the hackathon. It was a great experience overall. I’m really excited about the future of PyTorch.