Facebook AI’s DETR Applies Transformers to CV Tasks

Published in

SyncedReview

3 min readMay 29, 2020

Transformers are a deep learning architecture that has gained popularity in recent years, particularly on problems with sequential data such as natural language processing (NLP) tasks like language modelling and machine translation. Transformers have also been extended to tasks such as speech recognition, symbolic mathematics, and reinforcement learning.

To push the ‘Transformer revolution’ into the computer vision field, Facebook this week released Detection Transformers (DETR), a new approach for object detection and panoptic segmentation tasks that uses a completely different architecture than previous object detection systems.

“We present a new method that views object detection as a direct set prediction problem,” explains the Facebook research team. “Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task.”

DETR comprises a set-based global loss that forces unique predictions via bipartite matching and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, it can reason about the relations of the objects and the global image context to directly output the final set of predictions in parallel.

Unlike many other modern detectors, the new model is conceptually simple and does not require a specialized library. When tested on the COCO object detection data set, DETR matches the performance of previous SOTA methods such as the Faster R-CNN baseline.

It’s been over four years since Faster R-CNN was proposed as a SOTA approach in object detection, and new SOTA methods including last year’s new ResNeSt have achieved far better results. DETR’s novelty therefore lies primarily in achieving comparable results to an optimized Faster R-CNN with a simpler architecture.

And although DETR achieves significantly better performance on large objects than Faster R-CNN, it still struggles with small objects, a shortcoming the researchers plan to address in future work.

DETR’s design is not only straightforward to implement, it can also be easily extended to panoptic segmentation with competitive results, say the researchers. The team hopes to help improve the interpretability of computer vision models by applying Transformers to object detection tasks.

The paper End-to-End Object Detection with Transformers is on arXiv.

Journalist: Yuan Yuan | Editor: Michael Sarazen

We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Share Your Research With Synced Review

Share My Research is Synced’s new column that welcomes scholars to share their own research breakthroughs with over 1.5M global AI enthusiasts. Beyond technological advances, Share My Research also calls for interesting stories behind the research and exciting research ideas. Share your research with us by clicking here.

Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.

Facebook AI’s DETR Applies Transformers to CV Tasks

Written by Synced