Mutan: Multimodal Tucker Fusion for visual question answering
Foreword: The author is Hedi Ben Younes, former PhD student at LIP6 / Heuritech. Multimodal fusion of text and image information is an important topic at Heuritech, as most of the media on the internet is composed of images, videos, and text. The challenging Visual Question Answering task is an excellent benchmark for the fusion of text and image.
This blog post presents a work done by Hédi Ben-Younes*, Rémi Cadène*, Matthieu Cord and Nicolas Thome. The paper was accepted at the International Conference on Computer Vision…