A week of Computer Vision in Seoul: My ICCV 2019 Highlights

Eder Santana
Nov 7 · 4 min read

Note: Thanks to everyone that commented and suggested changes to the original draft of this post. Highlights are biased towards my current interests in information retrieval in large scale live videos.

I think ICCV2019 was one of the best conferences I’ve been so far. Some of my friends even said they liked it better than this year’s CVPR. It was pretty nice to see all the progress in computer vision. Also, Seoul is a wonderful city that I’d definitely visit again!

If I had to mention the top 3 most common topics I saw, I’d say compression/enhancement, neural architecture search and people tracking related work (person/face re-identification, crowd counting, etc). Honorable mention to GANs who are still pretty popular.

On the other hand, although I was pleased to see all the progress in video processing this year, I failed to learn about concerns of video processing at scale. Perhaps that’s left for future work. That said, here’s some of my highlights.

Bridging the Sim-to-Real gap in Computer Vision benchmarks

Karpathy’s take on the differences between computer vision research in academia vs industry and how to bridge the gap.

Karpathy presented Tesla’s Data Engine: how they organize their effort for progress in Machine Learning research for autonomous vehicles.

Coming from academia myself and now mostly focusing my research on real world problems, I found this one of my favorite talks. Also I like his suggested new kind of benchmark for industry:

  1. The val / test set is alive and carefully curated all the time based on QA.
  2. It takes the form of “unit tests” instead of metrics
  3. The org is split into two groups locked in a GAN-like battle: Group 1 tries to find images that break the network and drive down the % pass rate. Group 2 tries to improve the % pass rate.
  4. Model uncertainty is a “first class citizen”
  5. Instead of expected loss we care about maximum loss

Why are the COCO models allowed to improve alone? Introduce a second leaderboard for images that break SOTA models.

Here’s another great quote on the topic of iterating in the training and test data instead of focusing solely in the models:

It’s funny that people would consider this cheating in academia, because that’s what I do for my job everyday in industry.

Re-localization part of Torsten Sattler talk at the Visual Localization Tutorial.

Image retrieval (the same tech we use for image search) works pretty well for localization in maps.
  • (SIFT) feature-based methods still competitive
  • State-of-the-art on medium and large-scale datasets
  • Key idea of prioritization: Few unique / characteristic features are sufficient
  • Visibility filtering: Exploit co-visibility information from reconstruction stage

Action Recognition part of Jitendra Malik’s talk

Prof Malik’s talk made me curious to read about their SlowFast networks in more detail. Here’s the part where he shows SlowFast nets can learn optical flow like features without expensive optical flow estimation

Our oral presentation

Posters

DecptionNet: Network-Driven Domain Randomization was one of my overall favorites. Mostly for the concept of automatic data augmentation for training. I talked to Wadim and Sergey about possible ways to extend this approach to other domains and how one can automatically discover which invariances are relevant for the DeceptionNet to capture. Right now it relies on domain expertise.

This is one of my favorite representatives of the video compression papers I saw. Video Compression with Rate-Distortion Autoencoders: work on the entire video volume with 3D convolutions instead of the “find keyframes, encode diffs” approach. I think they still have the same issues with scalability of the other neural networks video compression approaches when we’re talking about YouTube/Twitch video scale…

I talked to my fellow Brazilian researcher Cézar de Souza, from Naver Labs Europe, about their paper on Learning with Average Precision: Training Image Retrieval with a Listwise Loss. And I think this is going to be one of the most useful papers from ICCV to me. In information retrieval, Average Precision is one of the main metrics for Learning to Rank performance. So, it’s obvious why I’d be interested in a paper that proposes a differentiable approximation of Average Precision as a Listwise loss.

Also there was our oral presentation on Exploring the Limitations of Behavior Cloning for Autonomous Driving.

Eder Santana

Written by

AI Researcher, Machine Learning Applied Scientist | 🇧🇷 | previously: FaceID @Apple; PhD @UF; AI for self-driving cars @comma_ai; 3D Deep learning @paracosm3D

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade