1
Background
CVPR’s full name is Computer Vision and Pattern Recognition. It is the top academic conference for computer vision where researchers, (PhD) students, and industry practitioners come together to present new state-of-the-art results in the area. There are also company booths set up to promote their companies and recruit talents.
2
CTO’s words
Yan Ke, Ph.D.
Co-Founder & CTO of Clobotics
It’s very interesting to see how CVPR has evolved over the years. When I was still a Ph.D. student at CMU, the conference was mostly academia participants. We published papers, exchanged ideas, and socialized a bit. There were less commercial companies’ participation and rarely see faces from the Eastern side of world.
With a few researchers from our company Clobotics, which I cofounded last year, we flew to Hawaii from different part of the world to CVPR. This time, it was a different experience. First, we saw lots of Chinese companies actively engaged. Almost 60% of the booths and sponsors were from Chinese companies. CVPR is still an academia-focused summit but we see more commercial applications in the computer vision related eco-system, which is great. Attending CVPR is even more exciting now than it was before. We hopped from session to session and the after-parties are awesome. We joked that the days of “code for pizza” have passed.
3
Frontline report from the Clobotics Delegation to CVPR 2017
by Chunhow Tan
Senior Computer Vision Scientist, Clobotics
Conference on Computer Vision and Pattern Recognition (CVPR) was held from July 21 to July 26, 2017 at Honolulu, Hawaii. This year, CVPR continues to see a huge increase in interest from the vision community, with a record number of 783 accepted papers, 4950 participants and more than 100+ company sponsors globally. The conference consists of multiple events for participants to learn about the recent advances in computer vision. From the academic side, there are oral and spotlight sessions where authors of accepted papers gave short talks about their papers. There are also poster session for all the accepted papers. In addition to the main conference, there are also a lot of tutorials and workshop sessions focusing on specific area in the computer vision researches. From the industry side, each of the company also set up their own booths to demonstrate their works, with many of them holding social events in the evenings.
One of the biggest focus of the works presented in the conference is object detection, which has enjoyed significant performance improvements thanks to the recent resurgence of deep learning techniques in vision community started by AlexNet from 5 years ago. In 2013, Ross Girshick introduced R-CNN model for the object detection task, which is currently the dominant approach for using deep learning to object detection task. In fact, Ross Girshick was awarded one of the two PAMI Young Researcher awards during CVPR this year (together with Julien Mairal). This year, a lot of scholars and companies also demonstrated their works to push object detection into more and more challenging and real world scenarios such as object detection for video in real time, object detection on mobile or embedded system, object detection for faraway and small objects etc.
In addition to object detection, the community also made a lot of progress in new model architectures and training techniques, fine-grained classification of similar classes, 3D vision, applications of generative adversarial networks, image/video captioning and etc.
This year’s best paper awards went to Densely Connected Convolutional Networks by Gao Huang et al, and Learning from Simulated and Unsupervised Images through Adversarial Training by Ashish Shrivastava et al. The first paper introduces a new model architecture, DenseNet, that outperforms state-of-the-art models on most of the image classification benchmarks, while requiring less memory and computation. Given that most of the computer vision tasks applying deep learning finetune base network trained on image classification tasks, this new model architecture could potentially improve performance on a lot of other computer vision applications. The second paper introduces a new method to refine synthetic dataset to be more similar to real training dataset through adversarial methods. This work might open up a lot of new possibilities of applying computer vision in domains and scenarios where we have very little data or when it is very expensive to collect data.
Derek Hoiem, Professor of Computer Science at UIUC
Speak on CVPR 2017
From the industry side, we begin to see a lot of interesting applications both in to C and in to B domains. From the to B side, there are a lot of companies focusing in the security and surveillance industry, and the retail and commerce industry. Here, the monetization is relatively straightforward as each of these companies collaborate with business partners to solve real business needs. From the to C side, there are also a lot of efforts such as in fashion recommendation (Alibaba, Amazon and etc), image/visual search, and etc that help improve user experiences in existing products, and potentially help drive monetization efforts in these companies indirectly. There are also increasingly more and more augmented and virtual reality investment from various companies such as Facebook’s Oculus, Microsoft’s Hololens, Samsung’s VR gear and etc. This area is probably still in early stage where each company is trying to build up the platform and entice more developers onto their platforms.
Another interesting trend in the tech world in the last few years is in self-driving car. Needless to say, CVPR is well represented by tens of companies and startups focusing on self-driving cars, including Argo ai, Cruise Automation, Didi, Momenta, tu Simple, Uber ATG, Zoox and etc. While all of these companies hope to bring the self-driving technologies to consumer as soon as possible, it is interesting to note that each of them try to achieve this in slightly different way. For examples, some companies (eg. Uber, Waymo and etc) use a combination of sensors such as camera, radar, lidar and a pre-computed map to navigate the car; while other companies decide to use just cameras, with the rationale being that human depends only on vision for driving. In addition, there are companies who solve self-driving by dividing the task into smaller components such as sensing, localization and mapping, route planning and control, along with human provided inputs such as road signs, road behavior and etc; other companies (eg. Nvidia) try to do end-to-end learning of self-driving just from vision and sensor inputs.
One interesting trend in CVPR, from the industry side, is the representation of company sponsors over the years. Harry Shum, EVP from Microsoft, noted in his keynote speech that the CVPR company sponsors have almost completely changed since CVPR 2007, with Microsoft Research being the only company that was on both 2007 and 2017 list. Interestingly, this year, we also see a lot of company sponsors from China, including big established companies such as Alibaba, Baidu, JD.com, Tencent, and etc, as well as private startups. While US has a lot of vision researchers and talents and has historically been pioneering a lot of vision researches and hardware inventions, Chinese scholars and companies have been investing heavily in this area in the recent years. Given massive amount of user data generated by Chinese internet users, which is the biggest in the world, it will be interesting to see which countries will be able to push the research and applications of computer vision furthest in the next few years.
IJCV ASIA Lobster Night
Going forward, computer vision applications will most likely drive more innovations in hardware that are custom-built to run vision applications. In the past few years, the computer vision (and more generally deep learning) applications have improved leaps and bounds thanks to new GPU chips from Nvidia. During this conference, we saw even more efforts from companies to build their own custom-built chips or hardware for vision applications, such as Microsoft’s HPU chip for Hololens, Deepglint’s FoveaCam, and etc.
Finally, as a couple of keynote and invited speakers mention, the line between computer vision and other AI areas are increasingly blurred, and often time, it is beneficial for researchers to apply ideas developed in other areas to advance state of the art for computer vision. In CVPR, we saw increasingly more works in image/video captioning that makes use of NLP models. Li Fei-Fei, in her invited talk in Beyond ImageNet workshop, also mentioned that the inspiration of ImageNet, which had driven much of the progress in computer vision in the last 5+ years, actually came from WordNet, a word taxonomy popular in the NLP community. We live in an interesting time, with AI advancing at an unprecedented pace thanks to the combination of new techniques, new hardware, more data, and simply just more researchers and engineers coming to work on AI. Maybe the next AI breakthrough is just around the corner, but how should we go about achieving it? Perhaps Dan Jurasky, a notable NLP researcher, summed it well in his keynote speech, in which he mentioned how cross-pollination of ideas from speech researches had helped driven applications of statistical techniques in NLP area: “Go talk to your neighbors! You never know what you or they might learn!”
