Tassilo Klein, Moin Nabi (ML Research)
Deep learning has heralded a new era in artificial intelligence, establishing itself in integral parts of today’s world within a short time. Despite its immense power — often achieving super-human performance at specific tasks — modern AI suffers from numerous shortcomings and is still far away from what is known as general artificial intelligence. These shortcomings become particularly prominent in AI’s limited capability in understanding human language. Everyone who has interacted in one way or another with a chatbot or text generation engine might have noticed that the longer the interaction goes on with the machine, the staler it gets. When generating long passages of text, for instance, a lack of consistency and human-feel can be observed. Essentially, this highlights that the model behind does not really understand what it says and does. Rather it is more or less walking along paths of statistical patterns of word usage and argument structure, which it acquired during training from perusing through huge text corpora. …
Due to the inherent model uncertainty, learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning. In this paper, we propose a novel Bayesian model-agnostic meta-learning method. The proposed method combines efficient gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. Unlike previous methods, during fast adaptation, the method is capable of learning complex uncertainty structure beyond a simple Gaussian approximation, and during meta-update, a novel Bayesian mechanism prevents meta-level overfitting. Remaining a gradientbased method, it is also the first Bayesian model-agnostic meta-learning method applicable to various tasks including reinforcement learning. Experiment results show the accuracy and robustness of the proposed method in sinusoidal regression, image classification, active learning, and reinforcement learning.
Annual Conference of the Association for Computational Linguistics (ACL 2020), Seattle, United States
We propose a self-supervised method to solve Pronoun Disambiguation and Winograd Schema Challenge problems. Our approach exploits the characteristic structure of training corpora related to so-called “trigger” words, which are responsible for flipping the answer in pronoun disambiguation. We achieve such commonsense reasoning by constructing pairwise contrastive auxiliary predictions. To this end, we leverage a mutual exclusive loss regularized by a contrastive margin. Our architecture is based on the recently introduced transformer networks, BERT, that exhibits strong performance on many NLP benchmarks. Empirical results show that our method alleviates the limitation of current supervised approaches for commonsense reasoning. This study opens up avenues for exploiting inexpensive self-supervision to achieve performance gain in commonsense reasoning tasks.
Deep learning-based approaches achieve state-of-the-art performance in the majority of image segmentation benchmarks. However, training of such models requires a sizable amount of manual annotations. In order to reduce this effort, we propose a method based on conditional Generative Adversarial Network (cGAN), which addresses segmentation in a semi-supervised setup and in a human-in-the-loop fashion. More specifically, we use the discriminator to identify unreliable slices for which expert annotation is required and use the generator in the GAN to synthesize segmentations on unlabeled data for which the model is confident. …
The recently introduced BERT model exhibits strong performance on several language understanding benchmarks. In this paper, we describe a simple re-implementation of BERT for commonsense reasoning. We show that the attentions produced by BERT can be directly utilized for tasks such as the Pronoun Disambiguation Problem and Winograd Schema Challenge. Our proposed attention-guided commonsense reasoning method is conceptually simple yet empirically powerful. Experimental analysis on multiple datasets demonstrates that our proposed system performs remarkably well on all cases while outperforming the previously reported state of the art by a margin. While results suggest that BERT seems to implicitly learn to establish complex relationships between entities, solving commonsense reasoning tasks might require more than unsupervised models learned from huge text corpora.
Humans have an extraordinary ability to learn continuously throughout their lifetime. The ability to apply previously learned knowledge to new situations, environments and tasks form the key feature of human intelligence. On the biological level, this is commonly attributed to the ability to selectively store and govern memories over a sufficiently long period of time in neural connections called synapses. Unlike biological brains, conventional Artificial Neural Networks (ANNs) do not possess the ability to control the strength of synaptic connections between neurons. This leads to extremely short memory lifetimes in ANNs — the effect known as catastrophic forgetting.
In the past decade most of the research in the field of Artificial Intelligence (AI) was directed towards exceeding the human level performance on isolated, clearly defined tasks such playing computer games, sorting out spam emails, classifying cats from dogs and recognising speech, just to name a few. As a result, most of the AI surrounding us in our day-to-day life can be referred to as Artificial Narrow Intelligence or weak AI. Strong AI, in contrast, refers to human-like AI that can perform any intelligent task, while being able to learn continuously, forget selectively, while quickly adapting to new tasks and making use of previous experiences. These properties only recently started receiving attention by AI researchers. …
Models trained in the context of continual learning (CL) should be able to learn from a stream of data over an undefined period of time. The main challenges herein are: 1) maintaining old knowledge while benefiting from it when learning new tasks, and 2) guaranteeing model scalability with a growing amount of data to learn from. In order to tackle these challenges, we introduce Dynamic Generative Memory (DGM) — a synaptic plasticity driven framework for continual learning. DGM relies on conditional generative adversarial networks with learnable connection plasticity realized with neural masking. Specifically, we evaluate two variants of neutral masking: applied to (i) layer activations and (ii) to connection weights directly. Furthermore, we propose a dynamic network expansion mechanism that ensures sufficient model capacity to accommodate for continuously incoming tasks. The amount of added capacity is determined dynamically from the learned binary mask. …
High performance of deep learning models typically comes at cost of considerable model size and computation time. These factors limit applicability for deployment on memory and battery constrained devices such as mobile phones or embedded systems. In this work, we propose a novel pruning technique that eliminates entire filters and neurons according to their relative L1-norm as compared to the rest of the network, yielding more compression and decreased parameters’ redundancy. The resulting network is non-sparse, however, much more compact and requires no special infrastructure for deployment. We prove the viability of our method by achieving 97.4%, 86.1%, 47.8% and 53% compression of LeNet-5, VGG-16, ResNet-56 and ResNet-110 respectively, exceeding state-of- the-art compression results reported on VGG-16 and ResNet without losing any performance compared to the baseline. …
We humans experience the world in a multimodal manner. We learn every day by not just seeing objects, but also by hearing, tasting, touching and smelling them. Our brain can easily connect these different information sources and teach us entire new concepts, based on only a few stimulators. If your child already learned what a cat, a dog and a horse look and sound like, it will be able to relate the information, and by this also understand what a unicorn is after seeing it only a few times. But how can conventional machine learning algorithms learn to understand the world around us in the same manner as we humans do? …
Abnormal crowd behaviour detection attracts a large interest due to its importance in video surveillance scenarios. However, the ambiguity and the lack of sufficient abnormal ground truth data makes end-to-end training of large deep networks hard in this domain. In this paper we propose to use Generative Adversarial Nets (GANs), which are trained to generate only the normal distribution of the data. During the adversarial GAN training, a discriminator (D) is used as a supervisor for the generator network (G) and vice versa. At testing time we use D to solve our discriminative task (abnormality detection), where D has been trained without the need of manually-annotated ab- normal data. Moreover, in order to prevent G learn a trivial identity function, we use a cross-channel approach, forcing G to transform raw-pixel data in motion information and vice versa. …