Human-Machine Collaborative Learning
We have entered a new era, where we are experiencing a steady and strong improvement of computational power, storage and availability of big data. Deep learning has shown to be the ideal technique for capitalizing on these trends, establishing itself as a de facto standard in supervised learning. However, this type of learning requires the data to be labeled, which generally entails considerable costs in data curation. Therefore, alternative paradigms are emerging that allow to maximize the potential of vast amounts of information in a new and distinct way. In this blog-post we introduce a concept that combines the strengths of humans and machines in a collaborative way. But before going into more details, let’s start with a story demonstrating the benefits of this approach.
In 1997, Garry Kasparov was defeated by a supercomputer (Deep Blue) in a chess match under tournament regulations. It was a hallmark event, where a reigning world chess champion was defeated by a machine. While Kasparov was still recovering from this experience, he also drew inspiration from Deep Blue. He asked himself: “What if I could play against a computer — with another computer at my side — combining our strengths, human intuition plus machine’s calculation, human strategy, machine tactics, human experience, machine’s memory?”
Kasparov’s idea of human-machine collaboration in chess was realized successfully in 2005 in a computer-assisted online chess tournament where grandmasters teamed up with supercomputers. The result was quite unexpected: the winners were a pair of amateur American chess players operating three ordinary PCs simultaneously. Seemingly, in this case the players’ skill to guide the computers played a very important role. This gives rise to the question whether such kind of collaboration can also be used for other tasks. The following paragraphs will shed light on emerging approaches in this context.
Approaches and Trends
Machines as Co-Workers, not only Tools
When comparing humans and machines, it is evident that both sides have very unique characteristics and strengths. Humans are great at making intuitive and creative decisions based on their knowledge. Computers are good at processing vast amounts of data to produce condensed meaningful information for deriving new knowledge and making better decisions. Capitalizing on the synergy of these distinctive strengths seems to be a natural next step.
In research, such combinations have been explored more deeply over the past years and are gradually experiencing more momentum. One approach was proposed by Mintz et al. utilizing unlabeled data to enhance relation extraction models through distant-supervision. In particular, they utilize a human-curated database to design a heuristic labeling function and incorporate it into the training procedure of a classifier. The classifier is then able to extract high-precision patterns for a reasonably large number of relations. Since the researchers design a labeling function that approximates the labeling behavior of a human annotator, it makes the supervision “distant”.
Another technique, developed by Wang et al. in the field of computer vision, improves object detection from unlabeled images through Self-supervised Sample Mining. An important part of this method is based on automatically discovering and pseudo-labeling reliable region proposals to enhance the object detector. This is achieved by pasting these proposals into different labeled images to comprehensively assess their consistency values under different image contexts. Although these images are pseudo-labeled, they contribute effectively in improving the detection accuracy and the robustness against noisy samples. Eventually, both described approaches annotate unlabeled data automatically and thus decrease the amount of human supervision in the training process.
Incorporating Human Guidance into Active Learning
In comparison to the previous concept, where the dataset is extended by machine annotated data, we could also let the learner select difficult samples and request a human trainer to annotate them, hence the name active learning. The method proves to be highly efficient, especially in situations with limited budget availability for training the samples — experts can focus on challenging cases while the machine takes over the majority of samples usually easy to resolve.
To explain the intuition behind active learning, consider the simple task of labeling images of dogs with respect to breeds. We start with a base dataset that contains labeled images of dogs. This dataset might pose a challenge for training in several ways: It might mostly contain images of dogs facing towards the camera, thus making a trained model invariant to dogs displayed from the side. It might also contain an imbalanced amount of samples for each breed. Or, it might contain look-alike breeds like the Belgian Malinois and the German Shepherd Dog. In such cases, both humans and machines would need more examples of each breed to learn to classify the dogs correctly. Active learning helps to solve problems of this kind.
Imagine that we would be able to achieve an accuracy of 80% with a model trained on a certain base dataset. We are given a budget for labeling up to 100 new images out of 1000 unlabeled ones and aim to use this budget wisely, since expert support for labeling is costly. Therefore, instead of choosing 100 samples randomly, we should rather let our machine learner choose the most difficult samples or those that capture the underlying data distribution best and minimize redundancy. We let the model suggest those 100 samples to the expert for labeling, for which it would assign labels with low confidence or high uncertainty. This way, our machine learner’s accuracy might increase to 95% after training instead of 90% in a setup where randomly labeled samples were used. Alternatively, we could also simply reduce the amount of labeled data and hence, train a model with the same 90% accuracy but less costs.
Adversarial Training: Combining the Best of Both
To combine active learning and the incremental improvement of automatic labeling, we need a machine learner that consists of two models. First, a discriminative model for measuring the uncertainty regarding the prediction accuracy for given samples (active learning) and second, a generative model for predicting pseudo ground truth for samples (automatic labeling). To increase the learner’s training efficiency, we aim to jointly optimize both models through adversarial training. In this way, the discriminative model can also be used to assign uncertainties to the predictions of the generative model and in turn increase the accuracy of predictions. A state-of-the-art model that has gained a strong reputation in the research community and satisfies the stated requirements is called Generative Adversarial Network (GAN).
Considering the framework shown in the figure above, we first use the generator (G) to predict pseudo ground truth for non-annotated data. Since the discriminator (D) is able to assign not only uncertainties to real ground truth but also the one predicted by G, we can sort the unlabeled samples by difficulty or uncertainty of D. We define samples whose distribution has not been fully captured by the model yet as difficult and let D suggest them to the human annotator. We use the other easy samples with low uncertainty for producing automatically annotated data with G. The human guidance results in a stronger D that is adapted to the requirements of the task specified by the teacher (active learning). In turn, the improved D will push G to predict pseudo ground truth of higher quality (automatic annotation). Due to this iterative improvement, GANs represent a natural framework for combining human and machine into one jointly-optimized training procedure.
While science fiction is full of machines and robots that threaten humanity and thereby create a climate of suspicion, the concept of human-collaborative learning with adversarial training shows how machines can reasonably supplement our work and lives in a positive way. In fact, the described approach has strong potential to transform a multitude of applications for instance in the health sector. Particularly, our team is currently developing an approach for the segmentation of 3D cardiovascular magnetic resonance (MR) images, which is an important prerequisite for the creation of patient-specific heart models and thus for the treatment of complex heart diseases. Our aim is to create a model that learns from self-generated segmentations and actively suggests difficult MR images to experts for manual segmentation. This could significantly reduce the costs and time spent on this complex procedure, such that radiologists can devote more time to patient care. While this specific example nicely demonstrates the concept’s positive impact on society, there is a multitude of application areas beyond the health sector, which will benefit from the results of research in this field.
Find the full research paper for MIDL 2019 here: Uncertainty-Driven Semantic Segmentation through Human-Machine Collaborative Learning