HUCVL Presentations at ICCV 2017

Published in

HUCVL Stories

4 min readOct 9, 2017

There will be one conference paper and 3 workshop papers co-authored by HUCVL members at ICCV 2017, which will be held in Venice, Italy from October 22–29, 2017.

Image credit: https://unsplash.com/photos/QnQmw0JfQr8

Main Conference

Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning

Authors: Berkan Demirel, Ramazan Gokberk Cinbis and Nazli Ikizler-Cinbis
Time: Tuesday, October 24 15:00–17:00 Poster Session P2

We propose a novel approach for unsupervised zero-shot learning (ZSL) of classes based on their names. Most existing unsupervised ZSL methods aim to learn a model for directly comparing image features and class names. However, this proves to be a difficult task due to dominance of non-visual semantics in underlying vector-space embeddings of class names. To address this issue, we discriminatively learn a word representation such that the similarities between class and combination of attribute names fall in line with the visual similarity. Contrary to the traditional zero-shot learning approaches that are built upon attribute presence, our approach bypasses the laborious attribute-class relation annotations for unseen classes. In addition, our proposed approach renders text-only training possible, hence, the training can be augmented without the need to collect additional image data. The experimental results show that our method yields state-of-the-art results for unsupervised ZSL in three benchmark datasets.

Workshops

Feature-Based Efficient Moving Object Detection for Low-Altitude Aerial Platforms

Authors: Kamil Berker Logoglu, Hazal Lezki, Mehmet Kerim Yücel, Ahu Ozturk, Alper Küçükkömürler, Batuhan Karagoz, Erkut Erdem and Aykut Erdem
Workshop: 1st International Workshop on Vision for UAVs
Time: Saturday, October 28 17:10–17:25

Moving Object Detection is one of the integral tasks for aerial reconnaissance and surveillance applications. Despite the problem’s rising potential due to increasing availability of unmanned aerial vehicles, moving object detection suffers from a lack of widely-accepted, correctly labelled dataset that would facilitate a robust evaluation of the techniques published by the community. Towards this end, we compile a new dataset by manually annotating several sequences from VIVID and UAV123 datasets for moving object detection. We also propose a feature-based, efficient pipeline that is optimized for near real-time performance on GPU-based embedded SoMs (system on module). We evaluate our pipeline on this extended dataset for low altitude moving object detection. Ground-truth annotations are made publicly available to the community to foster further research in moving object detection field.

Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction

Authors: Cagdas Bak, Aysun Kocak, Aykut Erdem and Erkut Erdem
Workshop: Mutual Benefits of Cognitive and Computer Vision
Time: Sunday, October 29 10:00–11:00

Predicting where human looks in images has gained a significant popularity in recent years. Compared to the vast number of saliency methods for static images, dynamic saliency estimation remains relatively unexplored. In this work, we propose deep saliency networks based on the two-stream architecture that processes both spatial and temporal information to predict saliency in videos. In particular, we investigate several fusion strategies to combine information coming from spatial and temporal streams and analyze their effectiveness for dynamic saliency prediction. Moreover, to improve the generalization of the saliency networks, we introduce a novel and cognitively grounded data augmentation technique. Experimental results on the DIEM, UCF-Sports datasets show that the proposed approach is able to model human attention mechanism behavior better than the competing methods, achieving state-of-the-art results.

Re-evaluating automatic metrics for image captioning*

Authors: Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis and Erkut Erdem
Workshop: 2nd Workshop on Closing the Loop Between Vision and Language
Time: Sunday, October 29

The task of generating natural language descriptions from images has received a lot of attention in recent years. Consequently, it is becoming increasingly important to evaluate such image captioning approaches in an automatic manner. In this paper, we provide an in-depth evaluation of the existing image captioning metrics through a series of carefully designed experiments. Moreover, we explore the utilization of the recently proposed Word Mover’s Distance (WMD) document metric for the purpose of image captioning. Our findings outline the differences and/or similarities between metrics and their relative robustness by means of extensive correlation, accuracy and distraction based evaluations. Our results also demonstrate that WMD provides strong advantages over other metrics.

* First presented as a long paper at EACL 2017