Insights from ICCV 2017

Moin Nabi, Isil Pekel, Hoang-Vu Nguyen and Anoop Katti

Moin Nabi
SAP AI Research
5 min readNov 10, 2017

--

Recently, the International Conference on Computer Vision (ICCV) brought together a large group of experts in computer vision and related areas in Venice, Italy.

Also this year, ICCV, along with CVPR, remains at the forefront of conferences in the computer vision field. Yet, while CVPR is evolving into a more general machine learning venue, ICCV preserves parts of the traditional computer vision problems. During the one-week convention, several talks, tutorials and workshops gave time to discuss selected problems, which had recently fallen into oblivion after the success of deep learning for visual recognition. Among such problems were 3D geometry, computational photography or low-level vision.

This year’s conference registered full success, receiving a vast number of submissions, papers, and participants. In particular, the later has more than doubled compared to the previous ICCV edition, increasing from about 1400 to around 3200 participants. This growing crowd of participants shows the increasing level of awareness towards machine learning (ML) and particularly the importance of ML for computer vision applications.

Besides being an official sponsor of ICCV, SAP’s ML researchers, data scientists and team were at the venue to participate in the talks, tutorials and workshops.

Here are our highlights as a summary of the full conference report.

General Trends and Discussions

Similar to other recent ML conferences, a large body of works was on unsupervised learning and Reinforcement Learning (RL). In the field of unsupervised learning, the community continued to explore generative models and in particular Generative Adversarial Networks (GANs). Thus, a large number of ICCV papers proposed different modifications of GANs by introducing different training techniques and showing its applications for different vision problems. Apart from this, there were several papers around the topic of RL. These lead on showing RL applications for robotics and particularly the navigation and visual planning. An interesting suggestion to point out was a new pathway combining RL with ideas coming from imitation learning.

Due to the cost of annotation collection in supervised learning and the difficulty of training with RL and GANs, the community also seems to regain interest to learn with minimal supervision. This was reflected by ICCV’s large body of works on weakly-supervised learning, semi-supervised learning, learning with noisy labels, as well as active learning for different tasks like object detection, activity recognition and visual relation extraction among others.

Not surprisingly, multimodal learning drew considerable attention. Based on the technology’s success gained through the integration of vision and language, several works suggested scaling up the modality integration to audio and other types of signals.

Following the recent trend of using gaming environments for different machine learning problems (e.g. AlphaGo), several researchers from different fields, such as robotics vision, visual dialogue or 3D geometry, pointed out the importance of applying simulation environments for computer vision. Consequently, the community is pushing towards replacing static datasets with simulation environments of both, training and evaluation. Among the reasons advertised for using simulation environments to learn visual tasks is the availability of free well-defined ground-truth labels, interpretability and focus on a particular aspect of tasks, as well as the low costs of failure.

Another important trend apparent at ICCV was the increasing thoughts put on privacy and security in ML. This concern holds especially true for computer vision applications, which seems to arise from the success of deep learning in the field. There were several papers proposing different adversarial attacks, evaluating the behavior of black-box models against these attacks, or suggesting some defenses for ML models in different vision tasks, such as autonomous driving.

One of the important open problems, particularly in fine-grained recognition, was learning with a small number of samples, known as one-shot learning. Several works in the conference addressed this problem from the perspective of meta-learning and data hallucination.

Finally, many papers presented core ideas like curriculum learning, self-paced learning and, in general, ranking-aware learning. In this context, the scholars suggested that for complex visual tasks the order from easy to difficult, strongly matters.

Interesting Papers

All ICCV papers can be found here: http://openaccess.thecvf.com/ICCV2017.py.

- Best Papers -

Mask R-CNN

Why this is interesting: Mask R-CNN is simple to train. It improves the accuracy in the instance segmentation with a very simple extension on Faster R-CNN and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, like estimating human poses in the same framework and shows top results in all three tracks of the challenges: instance segmentation, bounding-box object detection, and person keypoint detection.

Focal Loss for Dense Object Detection

Why this is interesting: RetinaNet shares many similarities with the existing dense detectors. However, the power of RetinaNet does not come from the innovations in the network design, but from the novel and robust loss function; focal loss. With the small change in the cross entropy function, the authors of the paper have achieved top accuracy.

- Other Interesting Papers -

Apart from the best papers, we found that a number of papers are worth taking a deeper look. Here is a list of our favorite examples from different topics:

Generative Adversarial Networks

Reinforcement Learning

Weakly-supervised Learning

Domain Adaptation

Efficient Deep Learning

Interpretable Deep Learning

Multimodal Deep Learning

Privacy and Security in ML

Learning with Noisy Labels

The SAP ML research team is working on a set of the introduced topics with our visiting students, such as the topic of ML under privacy, and engages in research collaborations with top-tier universities to drive progress in areas like generative few-shot learning.

For a more detailed description of the papers’ underlying ideas and an explanation of why we think they are of relevance.

Here the full conference report.

--

--

Moin Nabi
SAP AI Research

Senior Research Scientist at SAP Machine Learning Research